 Hi, I'm Drew, and my colleagues and I would like to welcome you to Java Programming Solving Problems with Software. We here at Duke are so excited that you're taking this first step in learning to solve real problems using Java. In this course, you will learn a seven-step process designed to help you understand how to approach any programming problem. You'll use this process to solve real problems, and you'll learn that computer science is so much more than the syntax of a programming language like Java. You'll have a chance to work on problems such as analyzing DNA, manipulating CSV files, and processing images. These are real problems that engineers, scientists, programmers, and others work on in real life, and you too will be able to tackle these problems as you begin to learn Java. I'm Susan. In this course, you'll learn to program in Java using techniques that can be used with simple programs, but that can also scale to larger programs and larger problems. The libraries and APIs we introduce make it easy to process data in many formats. You'll be able to use these same techniques, tools, and libraries in solving the problems we've designed for you. Problems whose solutions require the programming knowledge that you'll learn here. I'm Robert. As you learn about the syntax and semantics of Java programs, you will practice with the programming environment that's been specifically designed and proven to help learners, like you, who are getting started in programming. This programming environment will let you design, test, run, and debug your programs using techniques that software engineers, scientists, and programmers apply as they design, create, and solve problems using Java and other languages. This programming environment can scale to large problems and is a great first step as you learn to master increasingly sophisticated concepts. I'm Owen, and I'm really excited about the problems we've created for this course. We've used our collective years of experience to simplify problems and to provide you with opportunities to demonstrate your mastery of Java programming while you work on real problems that are only slightly simplified from those problems faced every day by those working in the many fields that use computational and programming approaches. We've designed our Java libraries in a similar way, using standard Java idioms that you'll see if you continue to study programming, but that are more easily used by those just getting started with Java. Once again, welcome to Java Programming, Solving Problems with Software. See you in the course. Hi, I'm Elizabeth. I'm part of the instructor team here at Duke University. Before you get started with this course, I want to make sure you're aware of some important resources and give you some tips for doing well. The assignments in this course will be programming exercises, so you'll practice writing code. Anything labeled programming exercise here in the course content is an assignment and contains instructions to help you write your own program. When you've finished writing your code, there will be a practice quiz where you can check your program works properly by comparing your results to answers provided by the instructors. I also want to show you the course site, DukeLearnToProgram.com. You can see we have a page for each course and a page of frequently asked questions about the specialization. This has everything from certificates to the software we use in the course. If you go back to the homepage and select the course you're working on, you'll get to that course's main page. So I'm using course two here as an example. What I want to point out now are project resources, documentation, and the frequently asked questions page. Project resources is where you can download code to follow along with the video lectures or to begin the assignments. Documentation has a summary of the Java methods you'll learn in this course. This is useful if you forget the name of a method or if you want to find out if there's a Java method to accomplish a particular task. It's not an exhaustive list of all Java methods. It's just a summary of the most useful ones for this course. Finally, the frequently asked questions or FAQ page contains questions specific to this course. For questions about the specialization as a whole, click this link up here. So hopefully this video has given you an idea of how the course is structured and what resources you'll need to know about. If you have any feedback about how we can make these resources more useful to you, please let us know in the discussion forums on Coursera. To help you do your best, we want to give you some suggestions about how to learn in the course. First, do a little each day. It's really hard to learn programming all at once. If you do a few course items each day instead of trying to do it all in a day or two, you'll remember things better, you'll be more motivated, and you'll have more time to work through problems in your code. Speaking of problems with code, also known as bugs, it's normal to make mistakes when you're programming. So our next tip is, don't give up. Everyone gets bugs in their programs, and part of programming is figuring out what's wrong and how to fix it. When you're programming, we really recommend following the seven-step process. This means you should plan how to solve the problem before you start writing any code. If you didn't take our first course, don't worry. There'll be a chance to review the seven-step process later. The seven-step process is important because it gives you a method for solving problems. Then when you've figured out a solution, you can start writing code. Once you're ready to start writing your programs, make sure you've read the relevant documentation so you know what Java methods exist and how to use them. Refer back to the documentation as often as you need to. Next, take advantage of the live coding videos and the practice quizzes. For the live coding videos, this is a great opportunity to program alongside the instructors. You can also download the code from the videos and run it yourself. Try making small changes to make sure you really understand what each part of the program does. Finally, for the practice quizzes, even though they don't contribute to your final grade, there's still a good chance for you to test your code. Use the practice quizzes to find and fix problems before you move on to the graded quizzes. Finally, if you're still having trouble with your programs, ask for help from the instructor team and your peers in the course discussion forums. Part of being a good programmer is knowing how to ask for help effectively. We'll talk more about that in the next video. It's completely normal to struggle with learning programming. And when that happens, the instructor team and your peers in the courses are here to help. The best way to ask for help is through the discussion forums, which you can find here. You can also answer questions in the discussion forums. And we really encourage that because explaining programming concepts is a great way for you to learn, and you also get to help out one of your classmates. Here are some general tips about asking questions in the forums. First, before you start a new topic, you should check the FAQ pages on DukeLearnToProgram.com and existing forum threads to see if your question has already been answered there. The first thing you should do when you have a question about a programming assignment or a quiz is check the FAQ page for that course. Second, start a new thread if you have a new question. Don't post your question as a reply to an existing thread unless it's really closely related. This way it's easier for people to see what your question is about and help you more quickly. Third, if you need to post code, use the code formatting box, which is the one with this symbol on. This is much easier to read than if you just copy and paste it directly into the post. We also want to give you some tips on writing a good question so that others can help you most easily. If you're having trouble with the programming assignment, name and link to the assignment you're working on. If your program is throwing an exception, you can post a screenshot of the error message and also the line of code that it occurs at. If your program is producing a result that's different from what you expected, make sure you say what input you're running your code on, the output you expected to get, and the output you actually got. For example, suppose I'm trying to write a program to change every green pixel in an image to blue. I should share the image I'm running my program on, my input. I should explain that I'm expecting every green pixel to become blue, my expected output. And I should also explain that every green pixel is actually becoming red, my actual output. Then others in the forum can better understand the problem I'm having. If you have to share some code because others can't help based on the description of what your program is doing, it's OK to share a few lines of code, but don't share your whole program. Figure out which part of your program you think the error is in, and share a few lines of that method. Don't copy and paste your whole program into your post. If you have a general programming question, such as how do I write a for loop, or how do I add items to a list, it's OK to post those lines of code because they're so general. Finally, if you have a conceptual question, make sure you name and link to any course materials you refer to. When you're answering questions, it's OK to share code for general programming concepts, such as how do I write a for loop. However, if someone is having a problem in their code, don't give them the solution, try to guide them towards fixing their code themselves by giving hints. If you don't know what their problem is, suggest what they might do next to debug their code. Let's look at some example posts. In this post, I didn't really ask for help very well. I said my code wasn't working, and I asked if anyone knew what might be wrong, but I didn't really explain what the code is trying to do, what it's actually doing, or what troubleshooting I had tried so far. I also posted a lot of code. So I edited my post, and now it's much better. You can see that I explained which assignment I'm working on, and I provided a link to it. I also explained what happens when I run my program, so my actual output, and what should have happened when I run it, so my expected output. Finally, I only posted a few lines of code that I thought the problem was in. Now that you've learned how to ask and answer questions about your code, you're ready to start learning how to program. I hope you enjoy the course, and I look forward to interacting with you all in the forums. Good luck. Hello. We wanted to take a moment to tell you about a specialization created by instructors from Duke University and the University of California, San Diego. It's called Object Oriented Programming in Java. I'm Owen Astrakhan, one of the instructors from Duke University, and I'll be helping you learn this specialization. We're going to start from the beginning with Java and learn how to use it to write programs to solve a wide variety of problems. I'm Susan Roger, and I'm another one of your instructors from Duke University. We'll start out with the basics in Java. So while we hope you have some programming experience already, we're going to assume you don't know anything about Java, but that you are eager to learn. The next course of this specialization is Java Programming, arrays, lists, and structured data. I'm Robert Duvall, and I'm excited to be teaching you about Java. In our next course, you'll dive more deeply into Java and learn to store data in more complex ways, allowing you to solve even more interesting and exciting problems. I think there's even a dinosaur in one of our examples. And I'm Drew Hilton, and I'm your fourth instructor from Duke. After you learn these Java fundamentals with us, our friends from UCSD will take over and teach you even more exciting things about Java and object-oriented programming. We'll let them introduce themselves to you now, and then you'll see them again in a couple of months. Hi, I'm Mia Midis. I'll be one of your UC San Diego instructors. You'll meet us when you get to the third course, Object-Oriented Programming. In this course, we'll build on the programming concepts you learned from our friends at Duke, and then we'll also talk more about the object-oriented nature of Java and how you can use it to build bigger programs to solve more complex problems. Hi, I'm Leo Porter, and I'm an instructor at UC San Diego. Another topic that you'll be learning in the third course of specialization is graphical user interfaces and event handling. With these skills, you'll be able to build interactive applications that are both easy and intuitive to use. I'm Christine Alvarado, the final instructor on the UC San Diego team for this specialization. I want to tell you a little about the fourth course in the specialization. In that course, you'll learn how to store data more efficiently so your program can quickly perform operations on large sets of data. By the time you finish that course, you'll be a pretty good Java programmer. So let's get started. Hi, in this lesson, we're going to introduce you to Java programming. In the BlueJ environment, we use in this course to design, develop, test, and run Java programs. Java is a popular and widely used language. It is a foundation for the Android operating system that powers more smartphones than any other system in the world. Java is an extremely useful and quite powerful language with extensive support on almost every kind of computer. Java is also accessible to beginners, which is why we're using it in our courses. At the end of this lesson, you'll have experience with BlueJ, the development environment we use for programming with Java. You'll be able to use BlueJ to compile and run a Java program. You'll know the basic design, edit, compile, and execute cycle that's part of programming with hundreds of different languages. And you'll see firsthand how simple and powerful Java is and accessing information that will print Hello World in many languages. This is our take on a traditional first program. We'll explain that more as you begin your exploration of Java programming. So, Hola Mundo, Konnichiwa, Zdravstryuchen, Nihao, and Hello World. Let's get started. Hello, world of learners. In this video, we will look at how code is organized in Java and how programs are executed by the computer. We will also demonstrate how to run programs in BlueJ, the programming environment you will use in this course. Java is an object-oriented language. This means you'll use classes and objects in writing your code. Classes are a way of organizing your programs, and objects are created using classes when your program runs. We'll learn more about objects and object-oriented programming in the next course. Classes have a .java file extension. In a Java class, you will write one or more Java methods, instructions for your computer to carry out when you run your program. The code you write is called source code. Source code is high-level code, which is human-readable, but not machine-readable. So when I open this Java class here, I can read the Java program one of my fellow instructors wrote in it. Now, in order for the computer to run my program, my source code must be translated into low-level bytecode, which is machine-readable. Bitecode files have a .class file extension. This process of translating source code held inside classes into bytecode is called compilation. When you write Java programs, before you run your program, you will need to compile it. So where do you write this code that will eventually be run by the computer? Programmers write code in programming environments. In these courses, we'll be using a particular environment called BlueJ. We chose BlueJ because it's a great programming environment for novices. It allows you to start programming without having to worry about editor complexities. And we've added some special features that you'll use as you develop Java programs for this course. You're going to run your first Java program. We're going to show you how to download it from the Duke Learn to Program website, and then open up BlueJ and run the program. So I have the Duke Learn to Program website up right here. And we are going to go to course two and click on there. There's lots of resources that we have provided for you on this website. Since we're going to run a particular project, we'll click on Project Resources. And here, you can see our first program called Hello World, the BlueJ project. I'm going to click on that, and that is going to give you the Java program and the data file. So if you just click on that, it's a zip file. And you will just unpack it, and you'll have everything there. So I am now going to start BlueJ. So you should have already installed BlueJ. I'm going to click here, and BlueJ is going to start. And for me, I've already got the program there, but if you don't see it, you may have to click up here on Project and Open Project. And you'll have to go to where your project downloaded to, which folder it's in. Once you have it there, you'll see Hello World. So let's click on that and see what's there. This is your Java file. And here, you can see the code for this. I'm just going to scroll over so we can see a little bit more. So this is a class, Hello World, and it has one method in it called Run Hello. And this is a very simple program. What we're going to do is we are just going to open up a file, and we're going to print every line in the file. Each line in the file is a greeting from some country. So we are going to create a file resource, and you can see we're going to tie it to the file called hello underscore unicode.txt. We are going to assign it to a file resource called res. And then what we're going to do is we're going to loop over the file. So we have a for loop where we are going to, every time we call the file resource res, res.lines, it gives us another line from the file. We're going to assign that line to the variable called line. And then you can see the line inside the for loop is going to print that line. So that's what our program does. In order to run the program, we are going to right click on hello world here. You can see all the diagonal lines there. That means we haven't compiled the program yet. You need to compile the program so your computer can understand it. So we will right click here, and there's a compile. So we'll compile it. If everything works good, you see that the slanted lines disappear except for just the two in the right corner. And then that means it's compiled. It's created the class file, which is a machine readable code that the computer understands. So now we can right click on that and create an instance or an object. You can give it whatever name you want. I'm just going to leave that name there. And so the object has been created, and now we can run it. So I'm going to right click on here, and you can see that method run hello is right there. I'm going to run it. Voila, we have just printed all the lines from the file. And you can see all these wonderful greetings here. We have hello, hello, bonjour, guten tag, aloha, all of these nice greetings. Now I want to show you where these come from. There is a file that you are also downloading, and I have a copy of it right over here. So this is the file, hello underscore unicode.txt, and you can see the exact same lines that we printed here. All here, bonjour, guten tag, aloha, all of these are right here. So our program again, let me just come back to our program right here. Again, just real quick, we open up the file. You saw the file. We go over the file. We create a file resource, and go through one line at a time in the file. We grab the line, and we print the line, and that's it. So hopefully, you've gotten your program to run your first Java program, and you've enjoyed it. Thank you. Hello, we're going to learn how to calculate the perimeter of shapes using Java. We'll use what is called a Java class, which you'll learn more about soon, named shape, to model polygons in geometry. These kinds of shapes are used in many applications. For example, a triangle is the simplest polygon or shape. It's simply a collection of three points. But triangles can be combined into complex shapes that are colorful or form the basis of wireframe diagrams used in computer graphics and video games. We'll use a shape Java class to represent a collection of points. We'll use a class for point and for shape to understand programming concepts that are applicable across many programming languages. We'll need to construct shapes by adding points one at a time. This will allow us to create shapes like this six-sided shape, which is constructed from six points. Here is a shape with five points. Here is another six-sided shape. The order in which the points are added to the shape is important. When we look at the code for the Java shape class, we'll also need methods for accessing the points, either one at a time or using other approaches. We'll likely want to create operations or methods that use shapes, like drawing a shape or finding the perimeter of a shape. We've seen that shapes can have many points. Our shape class is simple, but we could expand it to create complex shapes of many, many points, like this butterfly. When viewed more closely, it's clear that the butterfly is simply a collection of colored triangles, like the wired frame drawing we saw earlier. The shape class we'll look at is very simple. Circles aren't really a finite collection of points, so circles and some other shapes are harder to model using our Java class. In geometry, a circle is an infinite collection of points that are all the same distance from the center of a circle. Our Java class is just a finite collection of points. However, even with our simple class, we could model a circle using many points, as shown in the diagram here. So even our simple class will prove very powerful. After we cover some of the basics of writing programs in Java, we'll explore the point and shape classes in more detail. OK, you will be using the Java classes, shape, and point. But what will you be able to do with these classes? Well, in some ways, you know the answer to that. You will be able to draw shapes or calculated shapes perimeter. But just knowing what a Java class or script can do does not mean you understand it. So the question we are going to answer now is, how would you understand what this code does by yourself? That is, what are the semantics or meaning of each part of the code? Understanding the precise meaning of code is important because you can't write code without saying precisely what you mean. When we talk about understanding the semantics of code, what exactly do we mean? We mean, how would you execute the code by hand with nothing but pencil and paper? This skill is very important for a couple of reasons. First, it's how you understand code well enough to write what you mean. Second, when your code does not behave as you expect, how do you figure out what is going wrong? You need to understand what it does. And this gives you the skills to do that. Hello. We're going to start learning to execute code with the most basic statements in Java and build up to more complex statements from there. Here we have two variable declarations, one for an int called x and one for an int called y. If we execute these statements as is, we will create boxes labeled x and y to hold the values we eventually decide to put in these variables. But in this example, we have not explicitly provided any initial values. In some languages, such as C, no default value is ever provided when you declare a variable, meaning you get undefined behavior if you use an uninitialized variable. This is such a common and significant problem that Java provides two solutions. An initial default value of 0 is given to instance variables or an explicit error is given for using a local variable before initializing it. We will see these different types of variables in upcoming videos. Now let us look at another example, which initializes the variables when they are declared. Here, the first statement declares an int called x. Executing it makes a box for x and immediately assigns the value for to x. We show this by putting four in the box for x. Executing the next line, both creates a box for y and puts the value six in that box. Combining the declaration and initialization of your variables into the same statement won't pose a problem in any language and anyone reading your code knows exactly what you meant for the code to do. So this is a good habit to get into. Of course, variables are only useful if you make use of the value that they have. Here, the first code declares x and initializes it to four. Next, y declares and initializes it to x plus two. To perform this statement, we must evaluate the expression on the right hand side of the equal sign. x is four, so x plus two is four plus two or six. We show this by creating a box for y and putting six into it. The last statement creates a variable z and initializes it to y minus x. So you would take the value of y, which is six, and the value of x, which is four, and subtracts them to get two. You would then create a box for z and initialize it to two. Great. Now you know how to execute code with variable declarations and assignment statements. Now that you know a little bit more about expressions, let's see them in action. This code example starts with a declaration of an integer or int variable called x, which behaves exactly as you've already seen. Next, we initialize x to four plus three times two. As you know from math, times has higher precedence than plus. So this expression evaluates to four plus six, which is 10. So we put 10 in the box for x. Next, we declare another variable y of type int and initialize it to x minus six, which is 10 minus six, which is four. So we create a box for y and put four in it. The last statement says x gets x times y. Sometimes novice programmers expect statements which look like this to behave like algebraic equations with an equal sign, where you might solve for x. However, that's not what happens. Instead, you follow the rules you've already learned. The right-hand side evaluates to 10 times four, which is 40. And you put 40 in the box for x. Now let's see another example. Before we work through this, take a moment to pause the video and see if you can figure out what values x, y, and z have at the end of this code fragment. OK, let's step through it. First, we declare an initialized x. Next, we evaluate x times three, which is six, and initialize y to that value. Next, we compute y divided by two, which is three, and initialize z to that value. The last statement says x gets two plus z mod two. Since the two plus z is in parentheses, we compute that first and get five. Next, we compute five mod two. Remember from your reading that five mod two means we divide five by two, but take the remainder, not the quotient. So this expression evaluates to one. So we update the value in x's box to be one. OK, great. Now you should be able to evaluate code involving a wide variety of mathematical expressions. The next important idea to understand is function calls. Functions abstract a computation out, giving it a name and parameters. You then can use the function to perform that computation without rewriting it. You can also think about what the function does and not how it does it. Technically speaking, Java doesn't have functions. It has methods, since all code in Java is inside of objects. However, before we learn the more complex behavior of objects and methods, we'll learn the basic principles of function calls. These concepts will then lay the foundation for understanding method calls. We have three functions here. My function, f, and g. Why are we starting at g? We'll assume for now that you chose to call g from the BlueJ interface. Later, you'll learn about the main method, which is where programs start when you run them outside of BlueJ. We start with the frame for g and with the execution right at the start of that function. The first line declares a. So we create its box inside of the frame for g. Next, we're going to set a equal to the value of the function called my function of three seven. To evaluate this expression, we need to create a frame of the function that we're calling. In this case, my function. This will hold the parameters and variables of my function. Next, we pass the parameters to my function. We create a box for each parameter with the names coming from the function declaration, x and y. We initialize these by copying in the values of the expressions that were passed, here, three and seven. Next, we need to know where to return when we finish executing my function. This location in the code is named the call site. The place where the function was called. We'll note it with the marker one in the code and put the same marker in the corner of the frame. Finally, we move the execution arrow into my function and start executing code there. Here, we declare and initialize z, evaluating the expression two times x minus y. The values for x and y come from the frame for my functions, three and seven, respectively. So z will be negative one. Now we have reached a return statement. Return statements tell us to leave the current function returning to the call site noted in the frame. They also tell us the value to return to the caller. The first thing we need to do is evaluate this expression to obtain the return value. Here, the expression is z times x. So we evaluate negative one times three and we get negative three. Next, find where we should return. This is the call site we noted in the frame. Then we copy the return value back to the call site. The function call evaluates to this return value. We move the execution arrow back to the call site and destroy the frame for the function we just returned from. Now we're back in g. The call to my function evaluated to negative three. So this line behaves like a gets minus three. We'll finish that assignment putting minus three in its box. Our next line, again, has a variable declaration and a function call we make a box for b and go through the same process to call f. We make a frame for f and pass parameters. This time there's one parameter n whose value is a times a, which is nine. We note where to return and begin executing code inside of f. Our next statement is a return statement, but the expression involves a function call. So we have to evaluate that call before anything else. We start with a frame and pass parameters. x gets a value of n, which is nine, and y gets a value of n plus one, which is 10. We know the call site will use two this time since we are already using one somewhere else and move the execution arrow to the start of my function and start executing code there. We declare z and initialize it to a. Now we are ready to return from my function. We evaluate z times x, which is 72. Then we find the call site noted in the frame and copy the return value there. Finally, we move our execution arrow back to the call site, destroying the frame for my function. Now we pick up where we left off in f using 72 for the value of the call to my function. We evaluate three plus 72 to get 75. Since we're evaluating the return statement, this is the return value of f. So we find the call site and copy the return value there. Then we return to that call site, destroying the frame for f. Now we can finish the initialization of b. b gets 75. Lastly, we reach the return statement for g, which is where we started. When we return from the function where we started, we are done. Now let's see how to execute code with conditional statements in it. Here we have a function f, which has some if and if else statements. We have another function g. We'll assume we've used BlueJ's interface to invoke the method or function g to start with. The first statement declares a variable a. So we'll make a box for a. Even though this statement initializes a, it's gonna take us several steps to compute that value. So we're gonna leave a's value is zero until we compute the value of the method f applied to arguments three and four. To evaluate this call to f, we make a frame passing the values of the parameters. Note where to return. And then we move the execution arrow to the first line of f. The next line of code is an if statement, whose conditional expression is x less than y. We evaluate that expression and find three less than four is true. So I move the execution arrow into the then clause of this if statement and continue executing. The next statement is a call to system.out.printlin, which is how you print something in Java. We write x less than y in our output. And then we return y plus x. This values follows the same rules you've already learned for return statements. We evaluate the expression to get the return value, which is seven, which is then what the function call evaluates to and we return back to the caller, destroying the frame for f. Now we're ready to finish initializing a, since we know that the call to the method f evaluated to seven. So a is now seven. Now we're ready for the next line of code in G. We make a box for B and we call f passing in the arguments seven and five. Once again, we want to evaluate the condition x less than y. However, now x is seven and y is five. So x less than y is false. We find the closed curly brace for this if statement and see that the if statement has an else clause. So we move the execution arrow into the start of the else clause and continue executing from there. First, we see the system.out.println statement and this prints the line x is greater than or equal to y. Then we reach another if statement. This if statement is nested inside of the else, but that doesn't affect the rules of how we evaluate it. We see that the conditional expression is false since seven is not greater than or equal to eight. There's no else clause. So we move the execution arrow immediately past the closed curly brace and continue execution. There are not any more statements inside the else clause. So we move the execution arrow outside of the else clause and keep going. Next, we have a return statement. So we evaluate the expression x minus two to find that the return value is five. This then gets returned to the place we noted. So we destroy the frame for f and return back to g. We finish the assignment statement and now we're ready for the return statement from g. We execute that and we're done with method g. As you continue learning how Java works, it is important to talk for a bit about objects and classes. Before we delve into the specific semantics, let's talk for a minute about the high level concepts. When you are writing a program, you often have data in the form of variables which store the values you are computing on and code which manipulates that data according to the algorithm you have designed. Object-oriented programming is a paradigm of programming languages that groups data and the code that manipulates it together into logical units called objects. This language design aims to help programmers think about their program by grouping the code and data together into one logical unit. As you write larger and larger programs, this principle becomes more and more helpful. As you progress and learn more about Java, you will learn about a lot of important principles that go along with these ideas. For now, however, we are just going to start with the basics. Here, you can see an example of a class declaration. A class is a template that specifies how to make objects. Let's look at each piece of this declaration. The first line tells Java that we are declaring a class called point. As with variables, we have a lot of freedom in what we name classes that we create, but we should name them descriptively. Here, we are making a class which represents a two-dimensional point, so point is a good name for it. Next, we declare two fields, int x and int y. Field is the name for a variable that is inside of an object. They are also called instance variables since they are variables that are in each instance of the objects created from the class. These look like variable declarations except that the word private comes before them. Private means that only code inside of this class can directly manipulate these fields. You'll learn more about why that is important as you become more skilled in Java, but for now, we'll just make all fields private. Next is the declaration of a constructor for the class. A constructor specifies how to create objects of this class. It is code that gets run when an object is created to initialize that object. Note that the constructor looks like a function but has no return type and is named the same as the class. These are the hallmarks of a constructor declaration. In front of the constructor, we have the word public, which means that any code can use this constructor to create a point. After the constructor, we have three methods, get x, get y and distance. Methods are functions that are inside of classes. In Java, everything is inside of a class, so technically, all functions in Java are methods. These methods are invoked or called on a particular object and implicitly act on that object. You can see a method call here where the code says other point dot get x. This calls the get x method on the other point object. It will get the x of that particular point object. Last, we have the declaration of a static method. These behave a little differently from regular methods. They don't act on any particular instance of the class. They just belong to the class in general. That concept is a little tricky and we'll explain it more later. This method is called main, which is a special method. If you run your programs outside of BlueJ, main is the starting point. Execution begins in main before any objects are even created. The ideas of objects are supposed to help programmers think about their data in terms of objects that make logical sense. For example, if we make a new point, we are creating an object which represents something we can concretely think about. In this case, a point on a plane. We can then make another point which has its own x and y and represents a different instance of the same type of thing, another point on the plane. You can of course create as many of an object as you need for your algorithm. Once you have some objects, you can invoke methods on them, such as p1.distance p3, which you can think of as asking p1 to compute the distance to p3. That is, you can think of this line of code as saying p1, go figure out how far you are from p3. You can think of the code that executes for this method call as logically belonging to the p1 object. Now that you have the high level concepts, let us delve into the step-by-step execution of our example point code. We're going to start in main, which we'll talk about more later. Args gives you access to the command line arguments, which you won't need for quite a while, so we are just going to ignore that. The first line declares point p1 and initializes it to a new point. Note that classes are types, so we can use class types that we make to declare variables. Here, we want p1 to be a point. Until we finish evaluating the right hand side, we're going to draw this as an arrow with a flat end. We'll use this notation to indicate when a variable does not reference an object. Note that this is a bit different from numeric variables like ints, whose initial value is zero. We've colored this flat-ended arrow red to remember that we have not yet explicitly initialized this variable. Now, let us evaluate the right hand side of the initialization expression. We are doing new, which will create a new object. What type of object are we going to create? We look right after new and see that we are creating a new point. So we are going to draw a box for a point which has fields x and y. Note that the box we just drew is not in the frame but is outside of it. It is in a different area called the heap. Anytime you use new, you create data in the heap. The important difference is that data in the heap does not go away when a function returns destroying its frame. Note that we have put zero in the fields of this new point and colored them red as we have not explicitly initialized them yet. Speaking of initialization, remember that we said that the whole point of a constructor is to initialize a newly created object. The next thing that happens is that we call the constructor to initialize this point. As with any call, we set up a frame and pass parameters. However, constructors and methods take an additional implicit parameter which tells them which object they are operating on. That parameter is called this and its value is an arrow which points at the object that is being acted on. In this case, this points at the object that we are creating to tell the constructor which object to initialize. Now we enter the code for the constructor and begin executing statements there. The first line says x gets start x. There's no x in this frame so where do we store the value of start x? Here, x refers to the field inside of this object. That is, we want to put start x's value into the x field of the object we are initializing. So to find the right box, we follow the arrow for this and then look for the x field in this object and store the value three into that box. On the next line, y again refers to the field inside of this object. So we update that field to be four. Now we have finished the code inside the constructor and are ready to return to main at call site one. Back in main, we need to finish this assignment statement. To do that, we need to store the value of the right hand side into the box for P1. A new expression evaluates to an arrow pointing at the object that it created. So we'll make P1 to point at the newly made point. The next line does a similar thing. It declares P2 and initializes it to a new point. So once again, we create a box for P2. We haven't initialized it explicitly yet so we have a default value of a flat-ended arrow meaning it does not point at any object yet. We color it red to remember that it is a default value. We then create a new point with its x and y fields set to default values of zero and we call the constructor to initialize this point. Notice how this points at the newly created point. With multiple points in our world, it is important that we can keep track of which point we are working on. We then initialize the x inside of this point to be six. As before, we found the right box by following the arrow from this and we initialized the y inside of this point to be eight. Now, we have finished the constructor and are again ready to return to main. In main, we finish the assignment statement setting P2 to point at the newly created object. We'll stop there and pick up with executing the method call in the next video. In the previous video, we executed the declaration and initialization of points P1 and P2 and you learned about new and constructors. Now, it is time for you to learn about method calls. Method calls work a lot like function calls except that we have to pass the this parameter to let the method know which object it is working on. Let us resume where we left off. This line is going to call P1 dot distance P2 and then print its return value. We need to set up a frame for distance which will take two parameters. The implicit this parameter which tells it which object it is acting on and the other point parameter which is explicitly passed to it. The this parameter has the same value as the variable before the dot. In this case, we have P1 dot distance so this has the same value as P1. It is an arrow pointing at the same point object. We have colored this arrow differently but that does not have special meaning. It is just to help you keep straight which arrow is which in this diagram. For the other point parameter, we just copy the value that is passed in. In this case, P2. So the value of other point is an arrow pointing at the same point object as P2. Now, we go into the distance method and begin executing code there. Executing the first line, we are declaring a variable DX and need to initialize it to X minus other point dot get X. To evaluate that expression, we need to call other point dot get X. So we need to set up a frame for that method call. Again, this method takes an implicit this parameter to tell it which object to do get X on. Which object should this point add? Well, we did other point dot get X so we copy the value of other point which is an arrow pointing at that point object. Now we go inside of get X and begin executing code there. Here we have return X. So we need to evaluate the expression X and return its value. How do we get the value of X? We follow the this pointer to find the object we are working on and then look for the X field inside of that object. The value of that field is six so that is the value of the expression X here which is what will return to the caller at call site two. Returning to call site number two, we want to evaluate X minus six. How do we evaluate X here? We again look at the this pointer which refers to this point object. And we get the X field inside of that point object. That value is three which is the value of X in this expression. So we will compute three minus six and initialize DX to minus three. Now we are going to declare and initialize DY using a very similar process. First we make a box for DY. Then we call get Y on other point. Notice how this is a copy of the value of other point an arrow pointing at our second point object. We get the value of the Y field out of this object which is eight and return it to call site two where we return the execution arrow to. We evaluate Y by looking in this object and finding its Y field which is four. Now we are ready to finish the initialization of DY to four minus eight which is negative four. Our next line has a little bit of math. Let's look at it in detail. We are calling math dot square root of DX squared plus DY squared. First we can compute the value of the parameter which is nine plus 16 which is 25. So we need to evaluate math dot square root of 25. This looks like a method call but where did the math object come from? It turns out that math is a class not an object and is part of the Java library. This is a static method call. The method is being called on the class in general not on any particular object. Here the math class is just a handy place to put a bunch of mathematical functions together. Since we don't have the code for math dot square root we have to know what effect it has. If we did not know we would look it up in the Java documentation. However, as you might have guessed this method just computes the square root of its argument. So math dot square root of 25 will return 5.0. Our distance function returns that value to its caller so the value of the call to distance in main will be 5.0. We then return execution domain where we are ready to complete the print statement which will print out 5.0. Now we are done with main so we will return from it destroying its frame and exiting our program. At this point you have seen a variety of types such as it, point, and file resource. But what exactly is a type? A type specifies how data should be represented, interpreted, and operated on as well as what operations you can do with it. One important rule of computing is that everything is a number. If you didn't learn about the everything is a number principle and whatever intro courses brought you here we'll give you some links to videos about it. Specifically this means that everything is stored in the computer's memory as bits, ones, and zeros. But not all numbers mean the same thing. Some numbers might mean plain numbers while others might mean letters and others might mean the locations of data and the computer's memory. So the type of a value specifies how to interpret those numbers. It tells Java how to assign meaning to the ones and zeros stored in memory. The type also specifies what operations you can perform on the data. The type tells Java not only what you can do but how it should be done. Let's talk about both of these points in more detail. We just said that the type tells Java how to interpret the ones and zeros in memory. Let's talk about that a little bit more, ha. On the left we have the conceptual representation of the program state that we have been working with. There's a box for a variable called x. On the right we have a bunch of bits from the computer's memory. The blue ones correspond to x. But what do they mean for the value of x? Well, if x is an int, then these bits would mean it has a value of 1,234,567,890. We're not going into the details of how you can figure that out here, nor do you need to know it for beginning Java programming. But if you take a computer organization class, you will learn a lot more about how data is represented. The point we want to make here is that if x had a different type, like float, then the same bits would have different meaning. These same ones and zeros would mean 1,228,890.25. And if x were a string, then the bits would be the location in the computer's memory of the actual string object, which would have a sequence of characters. Those would also be stored with a bunch of bits. That would be interpreted as letters since their type would be char. We also said that the type tells us what operations we can do and how they're done. Consider this simple bit of code, x plus y, is it legal? And if so, what does it do? To answer this, you need to know the types of both x and y. If x and y are both ints, then this code is legal and performs integer arithmetic. If x and y are both strings, then this code is also legal but performs string concatenation. It makes a string with the letters of x first, then the letters of y immediately after. Notice that even though the plus operation is legal for two different types, we may perform that operation differently for one type than for the other. If x and y are both points, then this code is not legal. While we're talking about types, you might be wondering what you do if you need to convert between types. The answer is that it depends. For some type conversions, they can happen implicitly. If you have an int and you need to convert it to a double precision floating point number, the compiler will just automatically insert the conversion for you without complaint. The general rule is that you can use implicit conversion whenever the compiler will consider it safe. Here we are turning three into 3.0, which is not a problem. Note that the compiler does not consider the values, only the types when deciding if an implicit conversion is okay. For some type conversions, you can explicitly cast. This means that you tell the compiler that even though what you're doing is questionable, you are sure you want to do it. Here we are turning the double 3.14 into an int, which will discard the fractional part leaving us with x equals three. The compiler wants to be sure that we meant to do this, so we explicitly cast by writing int and parentheses. Other conversions require calling methods to calculate out the converted value. For example, if we have the string quote three, quote, and want to turn it into an integer, we just can't just directly cast it because the conversion is actually somewhat complicated. Instead, we have to call a method like integer.parcent, which will perform the conversion. The last thing we will mention about types is that there are two major categories of types in Java, primitives and objects. There are eight primitive types, int, double, char, boolean, long, float, byte, short. We'll primarily use the first four of these. Variables of primitive types hold their value directly in their box. Primitive types don't have methods, so you can't do .method call on a primitive, and they can't be null, although each primitive type has an associated wrapper class, which gives you an object to hold that primitive. Everything else is an object type. Some are built into Java like string, others are part of libraries that you might use like point or file resource, and yet others are classes you will create yourself. Whenever you make a class, the class you make is its own new type. Unlike primitives, the value of variables of object types is an arrow pointing at the object. This arrow is called a reference. You can invoke methods on the object with .method name, and the reference can be null, meaning it does not refer to any object. If you do equals equals on two objects, you're checking to see if the arrows point at exactly the same object. Okay, that's the basics of types. We know it's a lot to absorb at once, but you'll get better with these ideas as you practice more Java. Hi, now we're gonna learn how a for each loop works. This piece of code looks similar to the hello around the world example that you started with earlier. To make this work, we need to add an import statement to the top to tell Java where to find the file resource class. File resource is in a package that we provide to you to let you manipulate data in files before you learn the more advanced techniques and concepts that would let you do this in Java directly without the classes we've created. Accordingly, this is found in the edu.duke package. If you wanna run this code in BlueJ, you could do so by creating a hello world object, and then invoking its run hello method. We could also add a main method, which does the same thing if we wanna be able to run the code directly but outside of BlueJ in another environment. Let's start executing the code by hand. We start in main with a frame that contains args. We're not gonna use this args argument, so we're not gonna worry about its value. The first line declares hw and initializes it using new. You learned about new in a previous lesson, but there's a slight difference here. The hello world class does not have a constructor. So what do you do? In this case, Java provides a default constructor, one with no arguments, that does nothing. So we execute this line as you're used to, but we don't call a constructor. We make an object. Class hello world has no fields, so the object itself doesn't have any state or anything else we can see in it. Then we finish the assignment statement. Next, we call run hello, passing in hw as the implicit this argument. Inside of run hello, our first statement declares the variable f. The next statement initializes f to a new file resource object. This raises two questions. First, what does a file resource object look like, meaning what fields are there in the file resource object? It turns out we don't actually need to know this precisely. The details of how a file resource object works can remain hidden from us, as long as we know what it does. This is an instance of the important programming principle known as abstraction. Second, what does the constructor for a file resource object do? To find the answer to this question, we would consult the documentation for file resource. Consulting documentation for libraries, sometimes called APIs, that you use in programming is an important task. Since file resource comes from the Duke Learn to program libraries, we would look at the documentation website and we'd read about the constructors for a file resource class. There are three constructors and the one we want is the one in the middle. Since we're passing a string literal in, the string quote, file.txt quote. This documentation says the constructor will find that file with that name on our computer. Given this information, we're just going to represent the file resource object is knowing what file we asked for. The file whose name is quote, file.txt, end quote. We don't know precisely what fields are stored in this object, but that's okay. We'll represent it as precisely as we can because we know what the object does. We finished the assignment statement, so now F references this file resource object we just created by calling the constructor. The next line is new to us. It's a for loop. In particular, this for loop is often called a for each loop because it does something for each value in an iterable. What's an iterable? The short and simple answer is that it's an object which gives you a sequence of values. Let's dig a little more closely into the details of this loop. The first part declares a variable. The type is string and the name of the variable is line. This declaration will behave like any other. We'll create a box for line. Next is a colon. This is the syntax of a for each loop. We don't use an equal or assignment operator since we aren't assigning one value to line. Instead, we're going to use each value in the iterable sequence. Many people read the colon as in when they read the code out louder to themselves for string line in F dot lines. Next, we have an expression which evaluates to the iterable whose values we want variable line to refer to, one after the other. In this case, that expression is a call to F dot lines. To understand what this does, we'll need to understand what F dot line returns. So again, need to consult our documentation. From this API documentation, we see that F dot lines gives us an iterable whose values are each line in the file in the order that they appear. This means that we'll need to look at the file whose name is file dot txt to see what its contents are in order to understand and simulate the code behavior. On my computer, I've made file dot txt and put these two lines in it, hello on one line and world on the next line. It isn't a very exciting file, but we want this example to be relatively short. Returning to our code, we now create this sequence of strings, hello and world. We'll create the box for line and we'll make it refer to the first element of this iterable sequence. Now we go into the body of the for loop and begin executing statements there. The next line is a print statement. So we print out the value of line, which is hello. Now our execution arrow is just before the closed curly brace of the for each loop. We've reached its end. When you reach the end of a for loop, you move the execution arrow back to the start of the loop to do the next iteration, the next time through the loop with a new value for the loop variable. In this case, our loop variable is the string line. So we need to update it to have the next value in the iterable. We update lines arrow to point at the next value in this sequence and again go into the body of the loop. We encounter our print statement, but this time line has a different value. So we print world. Now we've once again reached the end of the body of the for loop. So we go back to the beginning, the start of the loop. We need to update line to refer to the next value in the iterable sequence. However, if we try to do that, we'll find that there are no more elements in the iterable sequence. We already used the last one. So instead we exit the loop. We move the execution arrow past the body of this for each loop and begin executing code there. When we do this, the loop variable, in this case the string line goes out of scope. That means it no longer exists. So we remove its box. Now we're ready to return from the method runHello, destroying its frame. Since the call to that method runHello was the last line in main, we also returned from main exiting the program. Today you're going to learn how to solve programming problems using the seven step approach that we will use throughout the rest of this course. When you're solving a problem, you're going to start with a problem statement. You know that you want to end up with working code, but going straight from a problem statement to working code is a rather large leap. It can take some significant thought and work. This is why we break it down into seven steps for you, giving you a manageable approach where you know what to do next to solve the problem. This is the seven step process that we recommend you use whenever you are solving programming problems that are difficult. As your programming skills improve, many problems will become easy enough to just do them in your head. However, there will always be some problems that are harder and having a step-by-step approach will be helpful to you. Let us look at each step in a bit more detail. The first step is to work an example of the problem yourself by hand. This should be a small instance of the problem, something with about four or five pieces of data to manipulate. You don't want to try to process a million pieces of data yourself. That would take forever. If you have trouble doing the problem yourself, you cannot write a program to do it. So what should you do if you get stuck here? Well, one of two things could be wrong. Maybe the problem is just unclear. The problem statement does not give you enough information about what you're supposed to do. In a classroom setting, you could consult your teacher or TA. In a professional setting, you might work with your technical lead or customer to clarify the requirements, or you might just need to refine the problem statement yourself. The other potential problem is that you lack domain knowledge. The knowledge of the field that the problem belongs to. If you're trying to write a program to compute physical motion and you do not know the physics equations that you need, that would be a lack of domain knowledge. When you have this sort of problem, you need to find the relevant domain knowledge before you proceed. In step two, you want to write down what you just did to solve this problem. You want to be as exact as possible. Do not leave anything out and write down how you solved it in a step-by-step fashion. At this point in the process, you are just writing down the step-by-step approach for the one particular instance you solved, not the more general problem. The tricky part in this step is that we often do things without thinking about them. If you gloss over something or you're not precise about what you did, it will make the later steps harder. In step three, you want to move from the particular instance that you solved in step one to an algorithm that works for any instance of the problem. That is, you want to devise an algorithm which can solve the problem correctly for any input that you give it. You'll do this by finding patterns in what you did and replacing specific behavior with more general behavior based on that pattern. Some important tools for finding patterns which we will delve into more deeply soon are looking for repetitive behavior, finding behavior which you do sometimes, but not always, and figuring out under what conditions you do it, and figuring out how specific values you use relate to the parameters you picked. So what should you do if you have trouble with this step? Go back and try steps one and two again. Use different inputs which will give you another example to work from. You can see the steps for a different instance of the problem and have more information to help you find the patterns. In step four, you want to check your algorithm before you proceed to turn it into code. If you found the patterns incorrectly or otherwise made a mistake in step three, you would like to find that out now. The way you check your algorithm is to pick at least one different input. Again, it should be small one and follow the steps of your algorithm for it. If your algorithm gives you the right answer, then you are ready to move on. If not, you should go back and fix it first. Now that you have devised an algorithm to solve the problem, you're ready to translate that algorithm into code. This step is where the syntax of a specific programming language comes into play. You need to write down your steps in the syntax of that language. Once you've written your code, you want to be sure that it works correctly. So you run test cases on it. Running a test case involves executing the code on a particular input and checking if it produced the right answer. The more test cases your code passes, the more confident you can be that it is correct. However, no amount of testing can guarantee that the code is right. When your program fails a test case, you know something is wrong. And when that happens, it's time to debug your program. You can watch a review video about debugging if you need more information, but at a high level, you will apply the scientific method to understand what is wrong with your program and determine how to fix it. Once you have identified the problem using the scientific method, you will need to revisit a previous step to fix it. If the problem is in the algorithm you designed, you will want to go back to step three and rethink your algorithm. If your algorithm is correct, but you implemented it incorrectly in code, you will want to return to step five to fix your code. Now you have learned the seven-step approach to solving a programming problem. In the next video, we will work through an example of the process. This process can guide you through programming problems you need to solve not only throughout the rest of this course, but whenever you need to solve a difficult problem. Thank you. Hi. In this video, we're going to walk through an example of using our seven-step process to solve a programming problem. In particular, the problem we're going to work on is, given a shape, find its perimeter. Step one is to work an instance of the problem yourself. That means you'll need to draw a shape and find its perimeter. We'll need a little domain knowledge here for this problem. Specifically, what is the perimeter of a shape? If you don't recall, the perimeter is the sum of the lengths of all the sides of the shape. We'll need to know that a shape is defined by its points and the points are listed in order as they appear around the perimeter. First, we'll draw a coordinate grid so we can draw our shape precisely and carefully. Then, we'll draw a shape on that grid. We're noting the coordinates of each point in the shape as we draw it. This will allow us to easily do math on them to compute the lengths of the edges of our shape. Once we have the shape, we can start finding its perimeter. This left edge has length four and this bottom length edge has length five so we can add them together and get nine. The running total of our perimeter. The next edge is the diagonal so we'll need to do a little bit of math. The difference in the x's is three and the difference in the y's is four. The square root of three squared plus four squared, nine plus 16 or 25, is five. That's the length of this edge. We can then add nine plus five so that our running total is 14. Now we see that the last edge has length two. We add two to 14 to get 16 so 16 is our answer for this particular instance of the perimeter problem. Now you're ready for step two, writing down specifically what we just did. First, we found the distance from the first point to the second point which was four. Then we took the second point to the third point which was five. Then we added four plus five to get our running total of nine. Next, we found the distance from the third point to the fourth point which was five and added nine plus five so our running total is 14. Then we found the distance from the fourth point back to the first point which is two and we added 14 plus two to get 16. Last, we said that 16 was our answer. So we can write down the steps to solve this particular instance of the problem as you can see here. Now we're ready to move on to step three where we're gonna find the patterns and generalize to find the perimeter of any shape not just the one we just saw. One thing you might notice is that we're doing almost the same thing repeatedly. When we generalize, we wanna look for similar steps and express them as repetition. To do this, we'll need to make the matchup exactly which we might do here by starting out by adding zero plus four to get four. Why does this seem like a good way to make these matchup? We'll keep adding the previous result to our running total of all the lengths so it makes sense to start with zero for our running total and add our current result to it. The next thing we might do in generalizing this algorithm is give this quantity a name. It won't always have these values so we should name the quantity and refer to it by that name. We'll call it curdist since it's the current distance. When we name this quantity curdist, we'll also want to change all the places we use that value when we computed it to reflect the name that we just chose. This gives us an algorithm that looks like this. Next, we should give this quantity a name. Here, total perimeter makes sense or total perimeter as this quantity is the total perimeter of the shape that we're calculating as we go through the steps. Again, when we name the quantity, we'll need to replace the places we use that value using the name we just gave it, total perimeter. This gives us an algorithm that looks like this. Let's stop and look at this algorithm for a second. Does anything strike you as maybe just a little bit odd? What about this place that we used zero? It isn't a previous value that we computed but why did we put this line in to begin with? We wanted all the steps to repeat exactly but now they look different. Can we make them look the same? Sure, we can make total perimeter start at zero before we begin the repetitive steps. Then we can just update total perimeter in our first step. Now these steps form a repetitive pattern. They're the same except for which particular points they're working with. The first point in each of the repetition counts through the points of the shape in the order in which they appear. That's great. We like it when we can iterate over a sequence because we can ultimately express this in code with a for each loop but this second point in each repetition is a bit more problematic. We'd have to have a way to get to the point after the current one. Now there are ways that we could set up our shapes interface or API to iterate over the points and ask for the next one too but let's do something very clever. Let's reorder the steps so that we do the distance from the fourth point to the first point first. That is, let us write the steps in this order. First, is it okay to do this? When we want to reorder things we have to think about it and be very careful. Here the reordering is totally fine. It's totally fine since addition is commutative. It doesn't matter what order we do the addition in. Second, why is it useful? Well now we're going through the points in order giving us a natural for each repetition but the other point which does not lend itself to the for each repetition is the one we just used. That means we can simply remember the previous point in a variable and use it with the next point. We express this idea by updating our algorithm to look like this. Notice how we update preef point to the beat of point we just finished with before moving on to the next point. Then we make use of that point in the next set of steps. But what about this point? We can use the same idea we saw earlier when we initialized total perim to zero. Start preef point out with the value we want before we begin repeating the steps. But will it always be the fourth point? No. It just happens that we had four points here but in general we'll want to start preef point as the last point in our shape. Okay, great. Now we have nice repetitive steps where the only difference is the point we're working on and those go in order through the points in the shape. So we can express all these steps in the colored boxes that you see here as a repetition for each point of the steps. This gives us a nice general algorithm for finding the perimeter of any shape. Later we'll translate this into code. Hi, we've developed an algorithm to find the perimeter of any shape. But are we ready to turn this algorithm into code? Well, we could, but we'd like to be confident that it's right before we do that. After all, there were a lot we had to do to generalize our steps and it's entirely possible that we made a mistake. Or perhaps we just didn't think through all the special cases. So before we turn this into code, we should test it out. To test the algorithm, we need a different instance of the problem, something other than what we used to make the algorithm. In fact, it's good if our test instance is pretty different from the one we used to make the algorithm. Here, we've shown a triangle. Instead of the four sided trapezoid we used when we developed the algorithm. Before we go any further, take a second to figure out what the right answer is. What is the perimeter of this shape? When we finish, you'll want to check if the answer to our simulated algorithm is correct. And to do that, you'll need to know the right answer. Now, we'll execute the algorithm by hand for this particular input. As we test the algorithm, notice the similarities between the code and English. We're gonna execute this English algorithm by hand just as we executed code by hand. They both work pretty much the same way and that's not a coincidence. When you turn this algorithm into code, you want to write down code that has the same semantics as the English, the same meaning. The code should transform the program state in the same way that the English transforms this diagram. So we'll start with total parin and set it to zero. And pre-point being the last point in the shape. But what is the last point? We'll say the points in this shape start at the top and go counterclockwise. So this point in the lower right-hand corner is the last one and we'll initialize pre-point to be that point. We'll note briefly that if this were actual Java objects, pre-point might be an arrow pointing at an object. But we're gonna just write down the coordinates here to keep the diagram simple and legible. Next, we're gonna do the steps for each point. So we need to start at the first point, which is this one at the top of the diagram. And that will be the initial value of curve point as we enter the for each repetition. Then we'll find the distance between these two points, which is 10. And we'll update total parin to be 10, zero plus 10. And then we'll update pre-point to be curve point. Now we're at the end of our for each repetition. So we'll update curve point to be the next point in the shape, which is negative three, negative four. When we update curve point, we go back to repeat these steps. We repeat the steps again, finding the distance between those two points, which is eight. And then we update total perimeter to be 10 plus eight or 18. And then we update pre-point to be the current point. Then we go back to the top of our loop, updating curve point. We repeat these steps for the last point in our shape, after which we've gone through all the points. So we skip to the steps after our repetition. Here we can say that total parin is our answer. Total parin is 24. Is that the answer you came up with earlier? Yes, that is the perimeter of this shape. The fact that our algorithm came up with the right answer here gives us more confidence that we generalized correctly. We're done executing the algorithm by hand, and we have some great confidence that it's correct. So we're ready to turn the algorithm into code. All right, now you've developed the algorithm to find the perimeter of a shape, and we want to turn it into Java code. You've seen a lot of Java code, as we've talked about the syntax and semantics. And so what we're going to do is step by step go through and take the algorithm that we wrote, and turn it into code that has semantics that match what we wanted our algorithm to do. So here I have a class called parameter runner. It's going to have the method getParameter, which takes a shape and returns a double. It's got some other code in here, so it has a main that we can run, which will make an instance of this class and call testParameter, so we'll use file resource to read from a file and create a shape and call the parameter and then print that out. We're going to go through and translate this to code, and I've written our algorithm here as comments, which are things for people that the compiler will ignore. So the first thing we say is start with totalParam equals zero. So this sounds like we needed a variable. We called it totalParam. We're going to set it equal to zero. We need to think about what type we want. Since we said it equals zero, we might want an int, but if we think a little more carefully, we might realize we want a double since we're working with floating point numbers that might have fractions. So double totalParam equals zero. And if we'd written this as int, we hopefully would find that out in testing when we come up with answers that should have fractions and we don't have them. Then we say start with previous point equals the last point. So we need another variable, previous point. What type of variable is this? Well, this is going to be a point. And it's going to be the last point we meant in as even though we didn't write that down. So we'll call getLastPoint. How do I know that was there? I looked at the documentation for shape before I started doing this video and saw that it has a getLastPoint method. Then we say for each point, which we wanted to call carPoint in the shape. So this sounds like a forEach loop. Each point, carPoint in the shape.getPoints, which is going to give us all of the points, we want to do these steps. And I'm going to go ahead and put them in curly braces. We want to find the distance from previous point to the current point and name it carDistance. So anytime we want to name a quantity, we want a variable. And so that's going to be the distance from preefPoint to carPoint. Then we're going to update totalParam to be totalParam plus currentDistance. And then we're going to update previousPoint to be currentPoint. Last, we said totalParam is my answer. Whenever we know our answer, we're going to return that because that's how we give our answer back to whoever called us. So now that I've written this code, I'm going to come up here and I'm going to click compile. And it says cannot finds a symbol variable shape. It's because my shape was called s. And now it says class compiled with no syntax errors. So now I'm going to shrink this window down a little bit smaller. I'm going to come over here and I'm going to do main, even though that's how we run programs outside of BlueJ. If we've written a main, we can do that. We're going to give it no argument, so we're just going to click OK. And it's going to ask me what input file has the points for this shape. So each of these files has some points. So for example, example one has negative one three, negative one, negative one, four, negative one, one three, which is what we used when we developed our algorithm. So that's a good first check. Make sure we get the answer we expected. And that gives us 16, which you may recall is what we came up with when we worked this example ourselves. Of course, using that might be bad because if we made a mistake, we wouldn't necessarily catch it since we've already used those values in developing this. So we might want another one. And so this says that this other shape with points of negative three, negative four, negative three, negative four, and three negative four has a perimeter of 24. If you work that out yourself, you'll find that that's the right answer. And so we become more and more confident that this code that we just wrote is correct. So that's how you turn your algorithm into code. You go through step by step, take each step, write down the code that corresponds to what you wrote. Hi, I'm Dr. Raluca Gordon. I'm a professor in the Duke Center for genomic and computational biology department of biostatistics and bioinformatics. My work is strongly based on designing and using computational algorithms, programs, and tools. And I want to tell you just a little bit about that. But first, I'd like you to think about the word strings. What does an orchestra conductor think about when she hears the word strings? What does a sailor think about the word strings? What does a pianist think about the word strings? Well, I'm a genome scientist, so I will tell you what I think when I hear the word strings, and that is genomic strings. The genome of an organism stores all the genetic information necessary to build and maintain that organism. This genetic information is stored as a long list, or string, over the four-letter alphabet, A, T, C, and G. These four characters correspond to the four DNA bases. Adonine, Thymine, Cytosine, and Guanine. The sheer size of the genome makes it difficult, if not impossible, to analyze by hand. The human genome, for example, contains three billion characters. That is a million times more than the characters shown here. Thus, finding any information in the genome requires computational approaches. In addition, the genome is complex and contains different types of information. Computational approaches are needed to find information, including genes, as you can see here. Finding genes requires more than simply looking for the tags or codons that identify the start and end of a gene. In addition to genes, such as the one shown here in red, we also need to look for regulatory elements, as shown here. We do this with computational tools and techniques. These regulatory elements are shown here as simple letters representing nucleotides. But it's important to remember that these are actually bound by proteins called transcription factors, which help turn genes on and off. My research is focused on identifying such regulatory elements in the human genome using various computational approaches. You will do similar things in this course, and this will prepare you to become a computer scientist or a genome scientist, depending on what your preferences are. Hi, and welcome to this lesson on finding information and patterns in data, which is a very general topic that we will make concrete in working with Java strings. Strings are sequences of letters, digits, punctuation, any character that you might type, for example. Why will you learn about strings? You've learned previously that everything is a number. That's true, as you can see here, where I've captured the beginning of three different files. These files might be stored in memory, on a flash drive, or on your computer's hard drive. The first file was a video, a file with a .mp4 suffix. The second file was an image, a file with a .png suffix. And the third file was a plain text file with a .txt suffix. Can you tell which group of bits of 0s and 1s with which file by simply looking at the 0s and 1s? Some people may be able to do that, but most cannot. Although everything stored on a computer is a number, information stored on a computer is often readable. We use strings to store data so that we can read it and so that we can write programs to read the data and process it. Here are parts of three data files where the data are stored as strings. It's important that the data is readable by you, not just by the computer, although we could write programs even if everything was only a number. It will be easier to write programs to find patterns, knowledge and information and data when the data is stored as strings. The first part of a file is genomic data stored in what's called FASTA format. You'll write programs in this lesson to find proteins and genes in genomic data. The second part of a file is from a web page. You'll write programs to find links and other information in a web page, doing at a small-scale what search engines like Google do in ranking pages to be found by those doing web searches. The third part of a file is data from a CSV file or Crime in Sacramento, California in 2008. The CSV file is a file in a special format. The CSV means comma-separated values. You'll write code in a later lesson to process CSV files. We have several goals for you to learn as part of this lesson. You'll learn about the Java string class. You'll learn many details how to write programs with this string class and how that's most often done. You'll learn common string functions and how to read documentation to find out more about strings. You'll learn about Java types and operators. Here you'll learn more about Java's numeric types and operators on those types, which for you will be int or integer and double or a floating-point number. You'll learn about programming to find patterns in data by searching for specific parts of a string. You'll repeat searches to find information and patterns like all the links on a web page or all the genes in a strand of DNA. Let's get started solving problems. Thank you. Hello and welcome back. You are going to be working with strings that represent DNA, solving problems like searching for genes in them. This problem is a great one to learn on because even though we're going to start with a greatly simplified version of it, it's an important problem with real-world applications. Of course, the lessons you will learn about working with strings and programming in general will help you far beyond this problem domain. Whatever sorts of problems you want to solve, strings are likely to come up in them in one way or another. HTML, email or really anything that is text that is represented as a string. You are also going to learn some other important lessons as you work on these problems like how to do math and Java. Of course, math is also ubiquitous in programming since everything is a number. And perhaps most importantly, you will get more practice developing and implementing algorithms with the seven steps. Before we dive into DNA-related problems, we need to give you a bit of domain knowledge, some terms and concepts related to working with DNA. Here is a string that could represent some piece of DNA. You will see that it's made up of four letters, A, T, C and G. Each of these represents one nucleotide, which are the basic building blocks of DNA. Three nucleotides together make a codon, which each describe one amino acid. The ATG codon shown here is special in that it indicates the start of a gene. Accordingly, it's called the start codon. And the TAA codon is also special in that it indicates the end of a gene, so it's called the stop codon. There are a couple other stop codons, but for now, we're only going to think about TAA. Everything between and including these two codons makes up one gene. The first problem you're going to work on is finding a gene in a string which represents DNA. That is, you want to write a program which takes a string like this one and gives you all the text between it and including the start codon ATG and the stop codon TAA. You're going to start with a greatly simplified version of this problem, just finding those letters and all the text between them. You will not worry about the fact that real genes must be multiples of three in length because they're made up of codons, or that there are some other stop codons or a few other complexities to start with. As you master more string and algorithm concepts, you'll add features to your program, making it more realistic with each step. As always, the first thing you want to do is work on an instance of the problem yourself. Let's take this DNA sequence and find the first gene in it. Let's find the start codon. Aha! There it is. Now let's start looking after it to find the stop codon, and we find it right here. That means that we want to take all this text representing the nucleotides in this region as our answer, the gene that we found. Now that we have worked an example, we should write down what we just did. First, I found the first occurrence of ATG. Then I started looking after the ATG for TAA. Last, I took all the letters between and including them as my answer, ATG, ATT, TTC, etc., all the way to TAA. Okay, now that we wrote down what we did for that specific problem, we want to generalize it. Why did we look for ATG? We always want to look for it. That's the start codon. What about TAA after ATG? We always want to do that too. That is the stop codon. And taking all the letters between and including them, we always want to do that too. The only thing that isn't really general here is the specific string we wrote down as our answer, which was more of a descriptive note to ourselves than anything else. Now that we have a general algorithm, we'd like to turn it into code, but we need to learn some new Java concepts first. How do we find ATG in a string? For that matter, how do we even represent or talk about the position of something in a string? And how would we get all the letters in a particular range in a string? You'll learn about these concepts, then you'll be ready to turn this algorithm into code. Thank you. To move forward implementing a gene finding program, we will explore a couple of important string topics in this video. The first of them is how we can represent the position of something in a string. To answer this question, we return to the recurring concept that everything is a number. That is, we will give each position in a string a numeric index. Notice that these numbers start at zero in the first position, not one. This may seem a bit odd, since we usually start counting from one, not zero. However, many programming languages are using sequences of things, such as strings from zero, because it makes some tasks easier as we shall see later. These numbers, which describe the positions in the string, are called indices or indexes. Either word is okay, is an okay plural of index. For example, if I wanted to talk about this E, I might say the letter at index three is E. We have now answered the first question. We can represent the position with a number, and we can talk about the index in the string where we find the ATG. Now it is time to answer the second question. How could we get all the letters in a particular range? Which we could now be a bit more precise and say between two particular indices. One option would be for you to write your own algorithm to do this. However, you can also use a built-in method of the string class. If there isn't a built-in method to accomplish a particular task, it is better to use it than write your own. Not only does it save you work, but the built-in method is already been heavily tested by expert programmers, so you can be very confident that it works correctly. For this particular task, you want to use the built-in substring method. However, before we show you how to do that, let's take this example string and make it an actual Java string, assigned to a variable. Here we have a variable declared with the name s of type string and an equal sign to make an assignment statement and a semicolon at the end of the line to end the statement. However, this isn't quite correct yet. We also need to put the string literal in quotation marks as shown here. If we did not put these and just wrote the word, Java would think Duke programming was the name of a variable and give us an error that is undefined. By putting the text in quotation marks, Java knows that we want a string that with that literal text. So now we have a valid statement which makes the variable s be the string Duke programming. Next, you can see an example of using the substring method. Here we declare another variable of type string called x and assign it to the result of s dot substring 4, 7. What do these numbers mean and what does this method call do? The first number specifies the index in s from which we want to start making our substring. The letter at this index will be included in the resulting string. The second number specifies the index in s where we want to stop making our substring. The letter in this index will be excluded from the resulting string. The method will stop right before it gets to that letter. This may seem odd. Why would you want to specify the index after where you want to stop? There are a variety of reasons why this method and many others are designed this way. But one nice reason is that the length of the resulting string will be the difference between the two numbers. 7 minus 4 is 3. So we will get a 3-letter string as our answer. In particular, you will end up with this 3-letter string made up of the letters from the indices 4, 5 and 6 of s. So x will be the string p, r, o. While you are learning about this built-in method of string, let's take a moment to talk about a few other useful methods and what they do. You just saw substring and learned how it gives you the letters in a particular range of indices. Another useful method is .length, which tells you how many characters are in a string. The string s has length 15. Notice that for a string of length 15, the valid indices are 0 through 14. If you try to access an index outside of this range, your program will have an error with a string out of bounds exception. Another really useful method is .indexof. You pass this method another string and it tries to find the first occurrence of that string within the string you called the method on. For example, here we have asked the indexof method to find the string program within the string s. You can see that the first occurrence of program is right here since it starts at index 4. That is the value that the method call would return. Here is another call to index of s.indexofg. Can you figure out what this would return? This will result in 7, since the first occurrence of the string g starts at index 7. If you call .indexof on a string that is not found, such as .indexoff, it returns negative 1. You can also give .indexof a second parameter, specifying the index to start searching from. Here we have passed 8 as a second parameter. So the .indexof method will ignore characters in the indices 0 to 7 as it searches for g. It will then find this g which is the first one when you start looking from index 8. So this method call evaluates to 14. Another useful method is .startswith which tells you if a string starts with another string. Here you can see that s starts with duke, so this method call evaluates to true. Likewise, .endswith checks to see if a string ends with another string. s does not end with king, so this method call evaluates to false. Wow, that was a lot of methods and a bunch of information. How would you know all of this by learning you about every method in the string class, and should you be memorizing all of these details? Of course not, programming is not about memorization. Although, as you program a lot, methods that you use commonly will naturally become familiar. Instead, you should learn how to make use of the language documentation, which describes all of the built in classes and their methods. If you search the internet from docs.oracle.com, Oracle is the company that makes Java, and docs.oracle.com is the website where they host the language documentation. If you click on this link, you will end up with a page that tells you all about the string class. If you scroll down a bit, you will find a rather long list of all the built-in methods in string. Here are a few of them, including a few that you have just learned about, .indexof and .length. These entries give a brief description of the methods, and if you click on one of the method names, you will get a more detailed description of what that method does. There is also a documentation page on the course site at DukeLearnToProgram.com. That page has simplified documentation of some important methods for quick reference. Great! Now, you not only know about indices and strings, and some useful built-in methods, but also how to learn about other methods when you need them. We are going to look at a gene in a DNA strand. We are going to do a very simple algorithm in order to find our gene. I have started some code here. We have an algorithm method called FindGeneSimple. We haven't done much with it yet, but what I have also done down here is I have written a couple of DNA strands that we will use to test the code that we write. You can see here, we have stringDNA. We are going to print our DNA strand. We are going to call FindGeneSimple and then print out our gene. And then I just have done this with four different strands of DNA. So let's write the code now. So I have started by just saying the resulting gene is just going to be called a variable called result and I have set it to null, the null string. And if you remember from the video, we are going to look for the start codon in the strand of DNA. And the start codon is ATG, the string ATG. So we can use the new string functions that we learned about. So I am going to start by creating a variable that will hold the index position of the start codon. So we will call it start index and we are going to look in the strand of DNA and we will use the index of function to look for ATG and the index of function is going to go through the string and it is going to stop when it finds ATG and it is going to return the index location of where it starts. So that will be the starting position of our gene. We also have to look for the stop codon and that is TAA. So we are going to create a variable called stop index for the position of where the stop codon is. Again we will look in our DNA strand. DNA we will use index of. But if you just look for TAA it will start at the beginning of the string and we want to look for the stop codon after the start codon. So we know where the start codon is. It is in the variable start index and we want to start looking past that. So you can add a second parameter to the index of function and so we will add that and say start looking where ATG is which is start index plus 3 which is the length of ATG. So now we have the start index and the stop index of where start index of ATG and the stop index of TAA and now we want the strand that includes those two and everything in between it. So I am going to call that our result. Again we will use a string function so we will use the string function called substring and the way substring works is you have to say I want a piece of a string and I want to start. Where do we want to start? We want to start where ATG is so we will start at the start index and then the rest of our gene is going to be everything past ATG until TAA which is at the stop index but we also want to include TAA so we will add 3. So that means take our substring from where ATG starts go all the way to the stop index plus 3 which is the first character after TAA and the way substring works is it says start at the first start index and then go all the way up to not including the second parameter. And then let's compile this and see if it works. So it compiles fine. So let's go over here and we will create a new fine gene and now we will run our test method that we wrote and there it is so we can see here is the first DNA strand for ATG there it is and then look for TAA and there is the first TAA that is after it and you can see there is the gene that we found right here and then the next example here is ATG we look for TAA all the way here and you can see here that is the gene we found here is the third example ATG all the way to TAA and you can see we got a big and then here is the last example ATG TAA so that works good too. But what happens if ATG is not there or what happens if TAA is not there so let's look at our example here and let's change one of our strings let's change this string we will just save this one and let's change it to a string that does not have ATG in it let's see what happens if we run that example so we are going to come back over here and we will run our test find and we got an error down here you can see it says string index out of balance exception minus one string index out of range what happened let's go look at our code so we looked for ATG and we didn't find it and what the index of function does if it doesn't find the string you are looking for it returns minus one so start index has the value minus one and then we are trying to build a string starting at minus one and minus one doesn't exist so we got an error so what are we going to do we need to fix this code so let's check right after we ask for ATG let's put an if statement in there to check to make sure that ATG was there so we can ask if start index is equal to minus one that means there is no ATG in that case we need to there is no gene there is no gene if there is no ATG there is no start so what we will do is we will just return the empty string so if we compile this now and if we run it we will create find gene and then we will run test find and it ran but there is no gene here that's because there is no ATG let's try another example we will change our code again to give another example let's just change this one here let's create a string where we have ATG but we don't have TAA so we have the start codon but not the stop codon so I will create a string C G ATG G T T A A A A G T so there is no there is let's get rid of it G there we go so now we have ATG right here and to the right of it there is no TAA so let's compile and we will run it and you can see there is no gene for the second one because ATG but there is no TAA so just like we put a check for if there is no ATG we should do the same for if there is no stop codon so no TAA so let's add that code too we would add it right after we get the value for stop codon so right here we can check and see if stop index equals minus one at that point we know there is no stop codon and so there is no gene and so we can again just return the empty string so that is just a safe way if there is no start codon or no stop codon let's add a comment here this is the case for no TAA we will compile our code and we will just double check and make sure that works so it compiles and we will run it and now we have our first string is good we find a gene our second string there is no gene we have an ATG but we don't have a TAA this string looks good this string has no ATG and it actually does have a TAA that doesn't matter because it has no ATG and so this there is no gene for that one either so anyway that is the way to find a simple gene in a strand of DNA we look for the start codon and then if there is a start codon we look past it for a stop codon and if we find both of those then we can return the start codon, the stop codon and everything in between it as the gene thank you so this is the first version of the rather oversimplified version of this program it just looks for ATG and then TAA and gives back everything in between but real genes have to be a multiple of three in length since they are made of codons each of which is three nucleotides long for example this string is a valid gene you can see here how it can be divided into codons starting with ATG and starting with TAA however this string is not valid even though it has ATG and it has TAA when we look between them we don't find a valid sequence of codons now let's make our algorithm a little bit more realistic you will fix this aspect of your algorithm it will still be a simplification but a bit more realistic at this point it's really good to note that starting with the simple version and adding features is not only a useful technique for you as you learn new concepts it lets us introduce a few concepts at a time but also important when writing real large complex problems okay so you just saw two examples of DNA strings one that has a valid gene that does not but how can you tell in general what is the change you need to make to your algorithm it might be useful to show the indices as we have shown here and highlight the location of the start and stop codons do you see how to algorithmically tell the difference if not that's fine it can be hard to spot patterns but later add it with practice one great technique to find patterns is to make a table you might remember that we did this in some of our examples we can then add more examples to our table as I'm showing here to help us see the pattern some with yes answers and some with no answers maybe you see the pattern now or maybe it is still hard to see what can you do if the pattern still isn't clear maybe adding more rows will help or maybe not instead we might start exploring the relationships between the items in the table here we have added another column for the difference in the index of the stop codon and the index of the start codon the ones that have yes answers have differences of 6 and 12 and the ones that have no answers have differences of 4 and 11 what do 12 and 6 have in common that 11 and 4 do not we might be able to think of a lot of things 6 and 12 are both multiples of 6 but that doesn't make sense in the context of this problem if we did more examples we could find ones with yes answers that have 3 or 9 or 15 however 6 and 12 as well as 3, 9 and 15 are all multiples of 3 this relationship does make sense as we know that the length must be a multiple of 3 now that you know the relationship you know that you want to look for you need to do some math and java you need a way to ask if the difference between two numbers is a multiple of 3 if you took course 1 with us you might remember the mod operator which gives you the remainder when you do division x mod y y means divide by divide x by y but give me the remainder not the quotient this can help you with the problem at hand since a number that is a multiple of 3 has a remainder of 0 when divided by 3 that is if x mod 3 is equal to 0 then x is a multiple of 3 you can use other mathematical operators in java plus minus times divide they're all valid you can also use equals equals the two numbers are the same not equals to see if they're different and less than less than or equals greater than or greater than or equals to check for inequalities you can also combine simpler expressions into more complex expressions this expression checks if a minus b mod 3 is equal to 0 it is evaluated by first evaluating a minus b then taking that result and doing mod 3 on it then finally taking that result and checking if it's equal to 0 this is pretty much exactly what you need for the problem at hand to see if the difference between two things is a multiple of 3 while you are learning about math in java there are several different types of numbers actually there are some variations on these but you don't need to worry about that right now one type number is int which represents integer numbers like negative 2, negative 1 0, 1, 2, etc ints can't have a fractional part the other type is double which represents real numbers ones with fractional parts like 1.2 or 3.457 of course you could also represent 3 which is 3.0 with a double you only want to use doubles when you need to since they have some behaviors that can be really confusing to novice programmers one word of caution about integers however is that math on integers always yields integers so what do you get if you divide 5 by 2 if you are thinking 2.5 remember that you can only get an integer result so you get 2 what about 100 divided by 3 times 4 you will get 132 since 100 divided by 3 is 33 and 33 times 4 is 132 so what if you do 100 times 4 divided by 3 seems like you should get the same answer right well actually now you get 133 why 100 times 4 is 400 and 400 divided by 3 is 133 these sorts of issues should not come up in these courses but are good to be wary of if you are doing integer division another thing to know about math in java is that it has order of operations rules like math in programmer speak these rules are called precedence and associativity as with math a plus b times c means to do b times c first then add that result to a what if you have mod it has the same precedence as division meaning it takes place at the same place in the order of operations so a minus b mod 3 means to do b mod 3 then subtract that result from a this is why we put parentheses around a minus b earlier when we wanted to do the minus first then take its result mod 3 comparisons for equality happen very late in the order of operations a plus b equals c minus d means to first do a plus b then do c minus d and finally compare the two results to see if they are equal to each other these rules are typically like they are in math multiplication and division come first then addition and subtraction also like math you can use parentheses to group things so that they come first if you aren't sure what order things will happen you can always use parentheses to be explicit and be sure you get what you want ok now that you know about math in java it's time to go improve your gene finding algorithm thank you welcome I'm excited to work with you as you extend the programming and problem solving capabilities you've developed for finding genes in a strand of DNA you'll work on finding genes using a model of stop and start codons that more closely resembles the work done by genome and computational scientists as they work on problems such as personalized medicine and understanding genetic issues with many species you'll learn about new classes from the edu.duke library and you'll practice with new programming constructs that will allow you to repeatedly execute program statements until a problem is solved like finding all genes in DNA or all links on a web page let's get started I hope you have fun thank you welcome back in this lesson you're going to learn a powerful new programming construct an indefinite loop you'll also learn about another iterable from the edu.duke library you've already solved problems using the file resource and url resource classes from the edu.duke library using intervals made it possible to repeatedly access data stored in a file on your computer or via a url from the worldwide web you'll use this concept of repetition to find all the genes in a strand of DNA a problem that genome scientists do as part of their work and a problem that models finding all links in a web page or all the YouTube videos you might want to watch about cats or dragons or anything else the key idea here is that you'll use the algorithm from a previous lesson that found one gene in a strand of DNA an algorithm you've tested and have confidence in you'll repeatedly apply this algorithm over the entire strand of DNA to find all genes not just one you'll also learn about a new iterable that lets you store intermediate results instead of printing all the genes you find you will be able to store them so that you can look for particular genes after you found strings that could be genes for example by storing the results of a gene search or of other kinds of search you can write separate methods to process rather than including the processing with the finding this separation of concerns find genes, process genes filter any of those genes that have specific characteristics is a hallmark of good software engineering writing a method to do one thing rather than doing several things this separation allows you to reuse code more easily and to develop a code more easily the storage you use will be an iterable you can add to while the program is running to be more concrete you're going to solve a variation of a problem you solved once before the problem of finding a gene in a strand of DNA genes are found in a strand of DNA at different sites within the DNA in a previous lesson you developed code for finding one gene like the region shown in red here you found this gene by looking for particular markers called start codons and stop codons these markers were used to identify the part of a string that could be a gene you could have used a similar algorithm to find where a link occurs in the html text of a web page looking for ahref rather than atg for example DNA typically carries more than one gene however so rather than finding this one gene that's marked in red you'll use programming techniques to find all the regions in a strand of DNA that could code for a gene you'll begin with the start codon as shown here that also are indicated by one of three stop codons in the next group of lessons you'll learn many things about Java and programming as you practice solving this gene finding problem you'll learn how to repeat a process many times even when you don't know how many times this is you'll do this by using a while loop a new kind of loop that complements the for loop with iterables you'll learn about the storage resource class from the edu.duke library using a storage resource object will allow you to add selected values to the storage and then access them in your code using the standard for loop with iterables you've used before this storage will also be a preview of future programming techniques you'll learn about in using arrays to store values you'll also practice with developing if statements and buoyant expressions that use what's called short circuit evaluation this will be an important part of the practice and knowledge you'll gain as you move toward becoming a better programmer and problem solver thank you as you have been learning about strings you have been improving on your algorithm to find genes in DNA however let us take a moment to think about what your algorithm will do on this string it will find the start codon ATG at index zero then it will find the stop codon TAA at index eight it will then check if the distance between them which is eight is a multiple of three because eight is not a multiple of three your algorithm will conclude that this is not a valid gene between the ATG and TAA there is one full codon ATC and two-thirds of another codon GC but if you were to keep looking past this TAA you would find another TAA at index fifteen now the distance between the ATG and TAA is fifteen which is a multiple of three so this is a valid gene the first TAA that we found was not actually a codon but rather pieces of two adjacent codons the T from GCT and the AA from an AAT your next improvement to the gene finding algorithm is to add this functionality to make the algorithm keep looking until it finds a stop codon that is a multiple of three away from the start codon having just worked that example let us now do step two of the seven step process and write down what we just did the first thing we did was to find the ATG then we found the first occurrence of TAA which was right here at index eight then we checked if the distance between them was a multiple of three or not in this case it was not so we found the next TAA after this first one the second one is right here at index fifteen then we checked if the distance between this TAA and the start codon was a multiple of three it was so all of the substring from zero up to eighteen was our answer in this particular set of steps we checked in two places to see if the distance was a multiple of three if this works in a general case you can just implement this algorithm with familiar if else statements however do we always only need to check twice let's look at a different DNA string with this DNA string we would need to check three times the first two TAAs are not a multiple of three away from the start codon but the third one is would checking three times be enough could we have to check four times five, ten fifty times this raises the question of how many times we have to check in general and the answer is that we cannot pick a particular number of times even if you wrote fifty if else statements we could come up with a DNA string that has more than 50 TAAs that are not a multiple of three away from the start codon before finding a valid one instead we need to write our algorithm so that it repeats the checking however many times it needs to as you have seen before repetition in your algorithm you need to look into a loop when you translate the algorithm into code to express your algorithm with repetition you will need to make the repetitive steps the same and figure out what to loop over previously you have seen four loops which iterate over the elements in some iterable such as pixels in an image now you're going to learn about known as a while loop which lets you iterate as long as some condition holds before we try to generalize these steps by finding repetition let's be a bit more precise about what we did we found the first ATG at index zero for the first TAA we started looking at index three and found it at index eight we checked if eight was a multiple of three it wasn't so we started looking at index nine for the second TAA and we found it at index fifteen we checked if fifteen was a multiple of three it was so everything between was our answer now let us take these steps and generalize them we looked for ATG here why was that? we always want to look for ATG because it is the start codon what about the fact that we found it at index zero we're not always going to find it at index zero however we are going to want to use that information so let's give that a name when we turn this into code what will be a variable that we will assign to this and then use later in particular we'll call it start index what about looking for TAA we will always want to look for TAA since that is the stop codon will we always start at index three? probably not why don't we start at index three here we started there because it was right after the start codon that we found in this case this would be start index plus three we won't always find it at index eight either so let's give that a name too we'll call it current index let's also be a bit more specific about the distance between them it is current index minus start index which happens in both of these steps next you aren't always going to start looking for index nine but why don't we start at nine here if you look back at where we worked the problem and wrote down our steps we started at nine because the previous one started at eight in our generalized algorithm we named the previous location current index so we can start from current index plus one we also should name the location where we found it should we give it a new name such as next index or should we just update an existing name such as current index in this case we want to update current index since that represents where we have found the most recent TAA if you did not realize this right away and gave it a different name you would realize it later on as you try to make the steps uniform so that you can express the repetition finally we'll generalize the last step to just indicate that the text between them is the text from start index to current index plus three now these steps look repetitive the repetition may be a bit hard to see since it only happens twice but if you wrote down the steps for a string with more TAAs that don't work you would see that you do these steps over and over again to make this repetitive let's write it down like this notice that steps four five and six are what we will repeat we've slightly adjusted the steps from before to reflect the choice we were making in step four and the two possible outcomes in steps five and six however we have left the conditions under which we repeat these steps blank here how do we know when to stop repeating them also what would you do after you stop repeating this loop we would stop if we run out of TAAs if that happened current index would be minus one which you know from having learned that you get minus one when you cannot find something in a string if you were to encounter this case it would mean that there is no valid gene in the string so you should give an answer of the empty string if you did not see this right off what could you do to figure it out you should work more examples until you understand the pattern now your algorithm is generalized but you'll need to learn about while loops before you can translate this into code thank you to implement our improved gene finding algorithm you need to learn about while loops the code that generally comes up when you have a step in your algorithm like the condition is true you can see an example of a while loop here let's look at this example to delve into the syntax and semantics of the while loop all while loops start out with the keyword while then they have some condition sometimes called a guard in parentheses this is the condition that you want to check to see if you should continue doing the loop then you have the loop body in curly braces these are the statements that should be executed each iteration of the loop let's walk through this example to understand the semantics of the loop what it means for example let us assume that x was previously declared and initialized to 0 and y was initialized to 7 when execution reaches the while loop the first thing that happens is java evaluates the condition is x less than y in this particular case that means is 0 less than 7 0 is less than 7 so the condition evaluates to true since the condition is true execution will enter the loop body and continue executing statements there the first statement prints x which is 0 so we would output the line 0 the second statement sets x to x plus 3 to update the value of x to be 3 now execution has reached the end of the loop body the closed curly brace marks its end then execution goes back up to the start of the loop and we are right back where we started but the variables have different values x is 3 instead of 0 we continue following the same rules evaluate the condition 3 is less than 7 it is true so we enter the loop body print x which is 3 and update x to be 6 now we have reached the end of the loop again so we go back up to the start of it and again follow the same rules check the condition it is true so we enter the loop body print x which is 6 update x to be 9 now we have reached the end of the loop body again so we go back to the start of the same rules now when we evaluate the condition something different happens is 9 less than 7 no so the condition is false instead of entering the loop body we go past it we would then continue executing whatever statements come after the loop great now you have learned the basics of while loops their syntax, the grammatical rules for writing them and their semantics and their meaning now you are ready to go code your improved gene finding algorithm ok now you have learned about while loops and are ready to turn this algorithm into code which will search for a TAA which is 3 a multiple of 3 away from the ATG even if there are other TAAs in between which are pieces of two different codons stuck together so here is the algorithm that we came up with and now we are ready to turn it into code we are going to find the first occurrence of ATG and call its index start index so as you are hopefully becoming familiar with by now we are going to say start index equals dna.index of ATG because that is going to find us the first index of ATG our second step says find the TAA starting from start index and call this result current index current index equals dna.index of and we are looking for TAA and we want to start from start index plus 3 this should also be becoming hopefully familiar to you with index of starting from a position now we are going to say as long as current index is not equal to negative 1 we have just learned is a while loop I want to repeat some steps as long as some condition is true so while current index is not equal to negative 1 I want to do these steps here and I am going to go ahead and put I have indented my steps to match the grouping that was in our algorithm in the slides and I am going to just go ahead and put the curly brace here so I won't get confused later we want to do as long as current index is negative 1 or is not negative 1 we want to check if current index minus start index is a multiple of 3 you have seen how to do this previously where we can subtract current index minus start index take that mod 3 and see if it is equal to 0 and we said if so we want to do one thing and if not we want to do another thing and as you have probably gotten familiar with by now if not is going to be else we want to do some other thing in this case so if so the text between start index and current index plus 3 is your answer so anytime a method knows its answer right away and is done we are going to return that answer back to whoever called us which finishes the method and gives back the answer we want return and we want the text between start index and current index plus 3 what method would you use for this hopefully you have become familiar enough with string methods by now that substring is what comes to mind is what you want so dna.substring start index current index plus 3 and then we said if not update current index to the index of the next taa starting from current index plus 1 so anytime we want to update a variable we are going to use an assignment statement and we can find the next one after that again using index of to search for a specific position index of taa from current index plus 1 so that just wrote that step down as it was my brace is not indented right I'm going to fix that and then if we get to the end of this loop if we keep going in current index is negative 1 meaning we couldn't find any more taas then our answer is going to be the empty string alright so I'm going to save that I'm going to go over here and I'm going to hit compile and it says reached into file while parsing and oops I messed up in here somewhere I'd written some test cases before we started you'll notice that it's highlighting this curly brace and if I scroll up it's highlighting this curly brace when I wrote these test cases I forgot to put the curly brace that ends the method fix that really quickly compile it and find gene simple I copied and pasted this testing code and changed the method name and didn't change that so we'll fix a couple careless mistakes there this is the danger of copying and pasting you should never do it now really it will compile so I'm going to make one of these and you'll notice here that I've written a little test method like we've had in the previous ones since things are getting a little more complicated with our test cases I put these comments here just to mark where the codons are so this ATG is one codon and then I've just marked here that this CTG is a codon and so we can see that this TAA here for example is a whole codon and this TAA here is not a whole codon so we would expect this method to find this gene right here and similarly for these other ones so I'm just going to run this method really quickly okay and for this first one what did we expect we expected ATG C G T A A T T A A ATG C G T A A T T A A so we got exactly what we wanted and notice this is where that while loop we wrote just came into use because this looks like a TAA but we would have found that it wasn't a multiple of three and we would have kept going and found there actually is one later on and then similarly we've done this other string which is a little bit longer has some more things that look like maybe they're TAAs but they're not but it does have a TAA at the end so we get that really long gene there and then at the end here if you look we don't actually have any TAAs and we get the empty string which is what we want so there we go we've turned this algorithm into code using a while loop and now it will look for any TAA which is a multiple of three even if it's not even if there's one that's earlier that's not a multiple of three in a way so that we can get the right answer alright so far you have developed an algorithm that would find the start and stop codons figure out that the length is a multiple of three and return the DNA string for this gene now it is time to add another layer of realism and thus complexity there are actually three different stop codons TAA, TGA and TAG so far your algorithm has only looked for TAA but now it is time to make a look for the first one of any of these three stop codons that is a multiple of three away from the start codon count so which one of these stop codons is the one you want well the first TGA is not a multiple of three from the start codon so that isn't the one you want the second TGA is also not a multiple of three from the start so you don't want that either that leaves you with TAG and TAA which are both multiples of three away you want the one that comes first in this case the TAG codon at index 12 okay so now you know what the problem is let's solve it to do this let's go back to our algorithm that we have from before the only thing that was specific to looking for TAA was these two occurrences of TAA in steps three and seven could we start from this algorithm and have most of our work done for us the best way to do this is to split the problem up we want to abstract out the part that searches for a stop codon into its own method we'll call it fine stop codon and it will take the DNA string to search in the index to start from and the particular stop codon string such as TAA, TGA or TAG to search for the algorithm will need a few changes such as returning the index instead of the text we need the three stop codons not just TAA however the basic mechanics of the algorithm searching for a codon that is a multiple of three from the start remain the same we'll come back to those changes in a minute for now we'll just assume we have a fine stop codon method that works once we have abstracted that out into its own function we could use it to find the TAA stop codon and call the method again to find the TAG stop codon then one more time to find the TGA stop codon note how this corresponds to what we did by hand we identified the positions of the TAA TAG and TGA stop codons it is a bit different from our example by hand since fine stop codon will only give us a position that is a multiple of three away from start index whereas we showed a TGA that was not a multiple of three for the purposes of illustration now that we have those three positions we want the one that comes first so we would just need to take the minimum of three values which we will call min index finally our answer is the substring from start index to min index plus three now let's look at what we need to do to the algorithm that we abstracted out into fine stop codon first these will no longer always be TAA instead they will be the stop codon parameter which tells us the particular stop codon we are looking for the other change is that we want to give back the index where we found the stop codon instead of the text between the start and stop codons why? our other algorithm needs these indices to compare to figure out which one to use before it gets the text and a valid stop codon in step four we can just give cur index as our answer however what should we give back to represent no valid index found in step six negative one is often a good choice to indicate no valid index but let's see how this would work with the way we are using the result of this function if we were to return negative one we could make it work but we would have to change this code to do more complex comparisons than just taking the minimum of these three values you'll see this approach later and learn a new concept in the process but for now let's make this work such that we can just take the minimum here instead we can return the length of the dna string since that is larger than any valid index of course now that we are doing that we should explicitly check for the case where no valid stop codon was found if min index is the length of the dna string none of the three stop codons was found okay great now we have an algorithm let's go turn this into code thank you hi welcome to another version of gene finding in today's episode we're going to look for three different stop codons in the previous coding exercises you've seen code that found one stop codon making sure it was a multiple three away from the start codon in this version of the code that you've already seen described previously we're going to look for three different stop codons as you can see here we have taa, tag and tga and in order to find each of those we're going to write another method a method called find stop codon that we saw outlined before with our seven step method up here I've got this method find stop codon and it has three parameters dna stir the string of dna in which we're searching for a gene start index the location at which we're going to begin the search and stop codon the specific codon we're going to be looking for we've got the code from our seven step process and what we're going to do here is write the code that goes along with this you can see here as we saw in the previous version of code I'm going to create a variable cur index and I'm going to make it the location of the first stop codon starting at start index plus three so as we've seen before the way we do that is create an integer variable cur index and we write cur index gets the location in dna stir of the stop codon starting at start index that's exactly what it says here find the stop codon starting from start index plus three so I better make sure I'm doing that correctly and store that value in cur index now as long as cur index is not equal to negative one and as you've already learned that's a while loop so we're going to say while cur index is not equal to negative one and when I type the right curly brace as you see here I'm going to make sure I put in a left curly brace at the same time indented the same way and one thing I can do there to make sure that I've got it right is to compile my program so I'm compiling it you can see I've got no syntax errors so I know I'm on shape now as long as cur index is not equal to negative one I've just coded that check to see if cur index minus start index is a multiple of three I'm going to store that difference cur I'm going to say int diff gets cur index minus start index now you don't need a new variable for that as you saw in the previous video you can just say if cur index minus start index mod three is equal to zero but this is sometimes easier conceptually to say if diff mod three is equal to zero that means and you can see I've already got my curly braces in there return cur index because it's my answer so I'm just going to say return cur index that's what it says here check if cur index minus start index is a multiple of three and if so cur index is the answer return it so I've got those two things taken care of and it says if not cur index so if not that would be else and if you think carefully you might see that although I don't need the else because I've already returned but it's probably easier to write it that way in terms of thinking about it it says update cur index looking for stop code on again starting from cur index plus one so you can see up there how I looked for it so I'm going to do the same thing cur index gets of I'm still looking for stop code on and remember that's a parameter to this method and I'm starting according to our comments at cur index plus one so cur index plus one and if we exit the loop we didn't find the stop code on so we return dna stir dot length you can see here that the loop is going to continue while cur index is not equal to negative one if it is equal to negative one we've exhausted the string the dna string in which we're searching and so it says return dna stir dot length as we'll see later when we write this code that's to ensure that we when we find the minimum value that our code works properly so I'm going to make sure that my method looks right find the stop code on and I started at index plus three so that's the last comment I have I'm going to get rid of that comment now so I've stored in cur index the location of stop code on while that's not negative one if it's a multiple of three I'm done and if it's not I need to keep searching my class compiles but just compiling doesn't indicate at all that I'm done I've written some test code down here test find stop as you saw in the previous coding video I've created a strand of dna and I'm going to look for taa which is one of the stop codons and I've got it in two different places when I run the method you'll see what it does and then I'll describe how that testing works so here's my workbench I'm going to right click on this and make a new object it comes in the object workbench I'm going to right click again test find stop and you can see it printed tests finished that means all the tests work correctly because if there was an error my print statements would have indicated that let's look at that testing code so you can see how I did that in testing find stop the first thing I did was look for the stop code on taa starting at index zero and you can see here I've got the indices printed above the string here is at index nine nine minus zero is in fact a multiple of three so it should have returned nine that it found taa and if it wasn't nine I got an error to see how it might work if it returned the wrong thing suppose I said if dex is not equal to ten now that would be a mistake because we know it returns nine and it's supposed to return nine you'll see in this case when I run it that it will print something right click to create the object right click to invoke the find stops and you can see error on nine now that wasn't an error because I knew I was supposed to get nine all these tests print something if there was an error and if there's no error they don't print anything my last print statement prints when the tests are finished so what we have now is this method find stop codon written and tested and we're ready to move on to the next method which was the same method that you saw before for finding just a single stop codon and now we're going to find any occurrence of a stop codon any occurrence of three different stop codons we're going to find taa or tag or tga this will be a very straightforward use of the abstract method find stop codon that I've already done so I'll do this very quickly find the first occurrence of atg the start codon and store that in start index int start index gets dna.index of atg if start index is negative one I'm done there's no start codon so I can't find a gene so if start index equals negative one return negative one I did not, sorry, return the empty string that's what it says right here and this method returns a string so I'm returning the empty string now I'm going to store in taa index the result of this method call so I'm just going to copy and paste this method call even though as you've learned copy and paste does not always work as you intended so store in taa index that method call and then I'm going to store in tag index the same thing but it says use tag and then I'm going to store in tga index that call with tga I'm looking to make sure that my copy and paste worked properly because copy and paste can sometimes be your friend but sometimes it can get you into trouble so I'm done that I've used taa index tag index and tga index I've got all those variables the next comment says store in min index the smallest of these three values so I have to find the smallest of all these values and one way to find the smallest value is to use the min function in the math library so I can say int min index gets math dot min of taa index and tag index that's the smallest of these two maybe I call that temp and then I'll store in min index the minimum of temp which is the first two and the last one which is tga index let me go over that again I've found the minimum of taa index and tag index that's these two and then I found the minimum of that value which is the smallest of two and my last index tga index and it says if this is dna dot length I return the empty string so if min index is equal to dna dot length return the empty string otherwise the answer is the text from start index to min index plus three and as you saw before that uses the dna substring method from start index to min index plus three I'm going to make sure I've got that right compile cannot find symbol min index and that's because I spelled that wrong there now it compiled into no syntax errors just compiling doesn't mean my code works I really need to test it so we'll leave it to you to test this just as I found a test method here you could write a test method to test these it'd be very similar to the one that you saw in the previous video but I could use taa and tag and tga once that test method is done I'll be confident my method worked correctly one final thing this use of math dot min to store in temp and to store in min index just to let you see it would be possible to do that on one line I could say math dot min of taa index and math dot min of tag index and tga index it's possible to chain the occurrences of min index together like that I'll leave that as a comment so you can look at it and think about it and that's it for coding with dna and three different stop codons have fun now you have an algorithm which works with any of the three stop codons however let's explore how you can make this code work if we decided to have fine stop codon return at negative one when there's no valid stop codon rather than the length of the string note that this is a perfectly valid design choice and mirrors what the dot index of method does but we need to learn a new concept to make it work with our code you'd want to change line six of the gene finding algorithm to reflect this choice you cannot just take the minimum since negative one is smaller than any valid index so what you want to do is pick the smallest number that's not negative one let's look at a few examples here taa index is negative one and tga index is three and tag index is six what index should you pick for the stop codon well we want three because it's the smallest number which is not negative one notice we can't just pick the minimum one because that would give us negative one which is not a valid value here what if we had five, negative one and eight we'd want five and negative one we'd want four and with negative one, negative one and eleven we'd want eleven let's think about how we can express this decision process algorithmically in the first example we compared these two numbers first even if you think you're comparing all three at once you're really comparing two at a time very quickly and perhaps without thinking about what you're doing of these two values we prefer three since the alternative is negative one and then we compare three against six we choose three since it's smaller than six in the second example we'd compare five to negative one then we'd choose five and compare this against eight in this choice you'd pick the smaller which is five in the third example you'd compare ten against four you'd choose four because it's smaller and then you'd compare four against negative one and choose four in the final example you're comparing negative one against negative one it doesn't really matter which you pick they're the same you then compare negative one against eleven you should think about what you would do if all three values were negative what would that mean? it would mean that no valid stop codon exists we'll have to use that in our code now how do we think through how we made these choices we're only going to look at the logic involved in making the choice between one pair that same logic will work with the second choice in the first example we picked TGA index because TAA index was negative one in the second example we picked TAA index because TGA index was negative one in the third example where neither is negative one we picked TGA index because it's smaller than TAA index and in the final example it didn't matter which we picked since both were negative one let's write down what our logic is for making that selection if TAA index is negative one we'd want to pick TGA index but that's not all there is to it we'd also choose TGA index if both TGA index is different than negative one and TGA index is less than TAA index notice how we've expressed this logic with or and and logical connectives make complex conditionals out of simple ones now let's put that logic into the algorithm we're working on we'll use that logic to pick between TAA index and TGA index and we'll store the best choice there is in a variable called min index then we'll choose the same logic to choose between min index and TAG index if min index is then there were no valid stop codons so we'll give back the empty string otherwise we've found a gene now let's see how we express these or's and and's in Java you can express and with two ampersands this is a shift 7 on most US keyboards we can see in this example if X is less than Y and Y is less than Z you can express or with two vertical bars also called pipes on most US keyboards this symbol is shift backslash and we can see an example here if A is greater than B or C is less than D in the particular case of the algorithm we're working on you could express the step as shown here notice how the or corresponds to the two vertical bars and the and corresponds to the two ampersands and an or have special rules called short circuit evaluation which basically means that if Java can figure out the result of an entire expression involving and and or by evaluating only the first operand then Java will skip evaluating the second operand let's look at an example suppose X is 8 and Y is 1 X is less than Y is not true because 8 is not less than 1 so X less than Y evaluates to false there's no reason to evaluate Y less than Z because whether it's true or false the whole and expression will be false because false and anything false for or consider this example and suppose A is 3 and B is 1 here A is greater than B is true because 3 is greater than 1 so the whole or expression will be true regardless of whether C is less than D or not so there's no point in evaluating the expression C less than D because true or anything is true Y is short circuit evaluation important if you skip evaluating Y less than Z in this example it doesn't really matter so much but it's much more useful when the second operand could crash your program if it's evaluated for example here we're checking if X is less than the length of a string and that the character at the X index of the string letter A if the first condition is false then X is not a valid index it's beyond the end of the string so trying to get the X character with the .carat method would crash the program with a string index out of bounds exception but fortunately this line of code is safe because of short circuit evaluation the .carat method will never be called when X is greater than or equal to the length of the string relying on short circuit evaluation is a great example of defensive programming and one of the tools in your Java programming tool kit have fun hi welcome back to gene finding get another time in this version of the program that we're writing to find genes we're going to make one small change to what we return when we look for a stop code on and that change is going to mean that we have to use some complex boolean expressions using ands and ors to make our code work properly so as we've already seen the one small change we're going to make is that instead of returning dna stir dot length to indicate that no stop code on was found we're going to return negative one that's a good value to return to indicate that nothing was found because now our code mirrors what for example the index of the method in the string class uses it uses negative one to indicate that no string was found when searching now our find stop code on method also returns negative one now that change means that our test function won't work correctly remember we had a test function down here these functions will now fail because they didn't find that so to see that very quickly I'm going to run this new class create an object on the object workbench and test the find stop codons and you can see I got an error 26 twice and that's because in my program I had if it's not equal to 26 so if I change that to negative one compile and now test my program again finding stop codons I can see that my tests finished so my test program needed to change to recognize this return value of negative one rather than the length of the string now I'm confident that my stop code on method with this one small change works correctly and that a returners negative one to indicate that no stop code on was found that means my find gene method is also going to need to change rather than using math dot min I'm going to need these boolean expressions that you've just learned about and I've put here in my comments what those boolean boolean expressions are supposed to do so I'm going to simply translate this stuff the comments from my seven step process into code and what that code says here is if t a a index is equal to negative one or and we use that double vertical bar for or t g a index is not equal to negative one and t g a index is less than t a a index now that's a lot so let's look and make sure we got that right it says if t a a index is negative one equals negative one or t g a index is not equal to negative one and t g a index is less than t a a index in that case it says set min index to t g a index so I'm going to say min index gets t g a index just as it says there that means I'm going to not use these versions of min index which were from before but if I compile this code I'm going to get an error message something about temp so if I comment that out let me comment all of these out now I get that variable min index isn't known so I'm going to define min index and I need to give it a value I need to give it some value I'll just give it zero I'm going to put in my else statement here what it says to else set min index to t a a index else min index gets t a a index so what I've got now in my code is a translation of if t a a index is equal to negative one or t g a index is not equal to negative one and this expression then set min index to t g a index else set min index to this value then I still have yet another boolean expression to write I need to write if min index is equal to negative one or t g a index is not equal to negative one and t a g index is less than min index in that case my comments say set min index to t a g index I'm writing that I'm simply translating this into my code here finally it says if min index is equal to negative one rather than dna.length return the empty string otherwise return this my program compiles well almost compiles I forgot a second parentheses up here but that's an easy thing to fix well maybe not fixing it that way by erasing it all maybe I should type in a parentheses instead now when I compile my program it works how do I test this before in the previous version I didn't have a test program a test method now I do I have my test method find gene so I'm going to try that out see if it works right click to create an object right click to testing find gene and it's just simply printed tests finished which is what I wanted in find gene I looked for a gene I found this start codon and I found this stop codon t a a now I could change this t a a to a different stop codon and I would keep testing those to make sure that my method works correctly I'll leave that to you because as we've already seen testing is as important as writing your code because you need to be sure your methods work correctly have fun testing have real fun programming hi so far you've written a method to find a gene and gradually improved it it's gotten much more sophisticated than when you first started although it's still a bit of a simplification of what you would actually do rather than continuing to refine this method let's ponder a different facet of searching for genes so far you've only been looking for the first gene in the string however strings may have many genes what if you wanted to find them all and print them all out you can already find one although you might want to make some slight adjustments to the methods so you can start looking midway through the string since you can find one and you want to find many you want to repeat steps using a loop loop should be getting pretty familiar by now since you might want to repeat things as long as there are more genes you can make use of the while loop that you recently learned about however there's a bit of a difficulty here we won't know if there are more genes until we start searching for them this may make it seem hard to write the loop condition there are many ways to deal with this situation in code but the one we're going to teach you about is how to use a break statement to leave the middle of the body of the loop we'll work through a slightly shorter example than we normally walk through in developing the algorithm to print all genes if you don't understand this completely when we finish pause the video and work through steps one, two and three yourself we're going to show you how it operates on this DNA string as we write it first we're going to set start index to zero start index will represent where we start looking for the next gene then we're going to repeat some steps as long as there are more genes after start index we want to find the next gene after start index we want to print that gene out and then set start index to just pass the end of the gene we found to show you how the algorithm would continue working on the string we then go back to step two and keep repeating these steps as there are more genes after start index notice this is the difficulty we alluded to before we need to know if we will find more genes or not but we haven't looked for them yet we'd find the next gene print it out update start index and then realize we should stop repeating steps because there are no more genes however we have this difficulty we need to know if there are more genes in step 2 but we don't look for a gene until step 3 which makes the algorithm a bit awkward to implement really we'd like to make a decision about whether to keep going or to stop right here between steps 3 and 4 here's a slightly modified version of the algorithm which does exactly that notice that our repetition instructions no longer have any condition they just say repeat these steps we'll figure out later when to stop likewise we now have another step in the middle of the loop that says leave the loop if we did not find any genes this will be much easier to implement once we learn how to translate these kinds of steps into code for step 2 repeating steps without checking any particular condition you can simply write while true if the condition of a while loop is simply true the code will always enter the loop body when we reach the top because true always evaluates to true the other new piece of java syntax will need is the break statement this is how you say leave this loop in this example we've translated if no gene was found into an if statement that says if gene.isEmpty the isEmpty method for a string returns true if the string is the empty string and false otherwise remember our gene finding method returns the empty string whenever it can't find a gene inside the if statement we've seen leave this loop translated into a break statement a break statement which is written in java simply with the keyword break followed by a semicolon causes java to leave the current loop regardless of whether it's a while loop or a for loop or any kind of loop you might yet learn about like a do while loop basically java jumps past the close right curly brace that ends the loop now we know how to implement these steps let's turn the algorithm into code and try it out we'll find every kind of gene there is have fun so you've been learning about while loops break statements and working on your gene finding algorithm and your current task is to write a method which will print out all the genes in a sequence of DNA so we have here the print all genes method which takes in a string for the DNA and has the algorithm or pseudo code that we came up with in a previous video now we have earlier here in this code the gene finding algorithm that you developed along with Owen in a previous video and we're just going to make one small change to it I'm going to have it take one more parameter to say where to start and this is going to let us start looking for a gene somewhere in the middle of the sequence of DNA so once we've found one at the beginning we can start looking for one after that by passing in this index and we'll just pass that to index of to tell it where to start looking for the first start code on so with that small change we're going to go back here and start translating our algorithm into code the first thing we said to do was to set the start index to zero so we haven't made a variable called start index yet so the first thing we need to do is declare start index it's going to be an int since we're setting it to zero it's going to tell us the position in the string also a good clue that it's an integer and then we're going to repeat the following steps so you're used to repeating steps by now you've done things for each pixel or each a variety of other things before with for each you've also seen some examples of while where you've had a condition for as long as something but here we want to repeat some steps and then figure out somewhere else inside those steps when we're done so we're going to write while true and then put curly braces around our steps and then we're going to go in here and we're going to start doing stuff and then figure out when we want to stop repeating these now we want to find the next gene after start index finding a gene is a big complicated step you've been devoting lots of effort and thought into how to do that so we've abstracted that out into a method so we can just call it and let find gene that we've already written do all that work for us so what does find gene take it takes a string for the DNA and an int saying where to start and then it returns a string so here we want to call find gene passing in what DNA well this DNA that we're working with and where do we want to start at start index and then it's going to give us back a string as its answer so we want to give that a name we're going to call it current gene we didn't say to give it a name in our steps so we kept referring to it without a name we kind of know what we mean but it's good to give things names in your steps anyways and now we can refer to that gene that we found if no gene was found leave this loop so if something is the case do some other steps by now you should be getting very familiar with if statements so if no gene was found how do we know if no gene was found what does find gene do if there's nothing left if you don't remember we can go back up here and look we can see that it returns this empty string so we want to know is current gene the empty string fortunately for us strings have a dot is empty method which will tell us if they're the empty string how would you do that if you didn't know if there's a java documentation are there other ways to figure out if a string is empty sure you could see if its length was zero for example any way that you can find that works is a good way to do it alright so if this gene is empty if current gene is empty and we didn't find it what do we want to do we want to leave this loop that is we want javas executing along and it gets here we want it to jump down here past this loop that we just learned about so we would get here and then come down here we basically put the loop condition in the middle alright if we don't break out of that loop we want to print out that gene system.out.printline hopefully you're getting somewhat familiar with that we can print out that current gene and then we want to update start index to be just past the end of the gene alright so now this seems a little tricky but it's a little bit complicated and maybe we have to think it through a little bit so let's just go down here to where I've got some testing and think about what we're going to do so I've written some methods to help us test and let's think through one of these cases here so here's so this test on just prints out that we're testing it and then does print all genes so we say okay start index is going to be zero that's the gene right here which is length nine alright and so then after that we would want our new start index to be nine okay so we might think we just want to add the length of the gene that got us from zero to nine that's a good first guess now let's think about this next one so then we're going to go and we're going to find here which is length three six nine twelve fifteen but that didn't start right where we left off right so if we were to only add fifteen we'd end up here in the middle of this gene and we want to end up over here so what do we need to do well we need to add the length to the start position of this gene how can we find that gene well index of is becoming your friend right so start index is going to be dna.index of current gene starting from start index plus current gene dot length that seems kind of magical let me just explain that again really quickly alright we want to find where that current gene was in the string we can do that by looking for it again starting at start index so that's going to tell me this gene was right here and then we can add its length to end up past here that'll work for this first one because looking for this first gene from start index of zero is just going to give me zero then when I add its length I'm going to end up here alright that still doesn't make sense to work it out on your own so now I'm going to hit compile all well there and I'm going to come over here and I'm going to make a new object and then I'm going to run my test method and you can see it says alright I'm testing print all genes on this sequence of dna and it's got two of these back over in my code I tried to make this a little easier for us to see so we've got this this gene here and then we've got this gene here these little v's are just for me to keep track of the codons and those were what it printed here I tested it on the empty string I wouldn't expect to find any genes in there but it's always good to test on these like wacky corner cases where maybe our code would break if we weren't careful we didn't find any genes but our code didn't crash either that's really good and then on this one where we have this one long gene here which we found and printed out and then this really short gene here which we also found and printed out so that worked well and the important thing here the new lesson was this while true figure out if we want to keep going in the middle and then break alright now you can iterate over all the genes in the dna string and print them out if you wanted to iterate over the genes in the dna string and do something else to them the algorithm would be pretty similar in fact for whatever you want to do with each gene it would look pretty much like this the line in blue is the only thing you would change to do whatever you might want to maybe printing only those genes that meet some condition counting genes saving them to a file building a web page with all of them or anything else you can think of so if you wanted to do these other things you might copy the algorithm you have pasted pasted into a new method and edit that line to make small changes this approach works but it's generally a bad idea why? well for one thing copy and paste is error prone you might forget to change some things that you need to even worse if you find a bug in your original implementation after you have made copies you need to go fix every copy you made it is also tedious you have to go find the method copy and paste it and change it this may not be so bad if you want one variation but if you want to do five different things it's pretty boring and finally it indicates bad programming design choices whenever you find yourself wanting to copy and paste there's almost always a better approach let's take a moment to see something that we could improve about this algorithm why it would make a lot of work if we leave it as is let's change and then we'll understand the motivation for how to fix it this is our algorithm to print all the genes in a string we're going to condense it down to a small description at the bottom and then copy and paste it and change line 5 to print only the genes with a high cg ratio and then we'll condense that down to a short description now we're going to copy paste and edit several other algorithms to do various things with the genes from our DNA string like printing genes in html writing genes to an output file counting genes with the code on cga or whatever else you want to do all of these algorithms are the same except for the details of what they do to the DNA string so at first copying and pasting does not seem like a big deal later we end up with some other DNA data which lists all of the genes in a file one gene per line we need to do the same sorts of operations on this data too but the algorithm will be slightly different it will have a for each line in the file loop then do the same operations for the genes with our copy and paste approach we now need to write and test six algorithms they are pretty similar to each other so it may not be so hard but it's tedious error prone work then if we end up with some other source of data we're going to have to go make all six algorithms again for that data source likewise if we end up with a new operation that we need to do we're going to have to write three copies of it one for each data source ick, what a mess what we would really like to do is redesign our algorithm to use separation of concerns our initial algorithm does two tasks one is getting all the genes from some source of data and the other is printing them or whatever else we want to do to each of them we would like to split these up by having the algorithms that find the genes put them into some structure that can hold a list of all the genes then having the algorithms that print the genes count the genes or whatever else we want to do to the genes to operate on that list now if you need to add some new source of data you just write away to get its genes into our list and it automatically works with every processing algorithm you already wrote likewise if you need to write a new processing algorithm it automatically works with every source of data that you already wrote no copying and pasting is ever needed so what is this thing that can hold all the genes your algorithm is fine we're going to start by using a class from the edu.duke package called storage resource which is a simplified way of doing this later on once you have learned a few more concepts you will transition to using the standard java.util.arraylist class which has similar functionality but it's a lot more complex thank you now you've learned that you want to store your data in a list so you can separate concerns you learned you'll start out with our edu.duke.storage resource class which is a simplified way of doing this what exactly is a storage resource and how do you use it it's a class that holds a collection of strings you can call .add to put a string into the storage resource you can also use .data to get an iterable so you can iterate over all the strings that have been put into the storage resource you created there are a few more methods and you can read about them in the dukelearn2program.com website where there's a document page for the storage resource class let's look at an example of using it first we're going to declare a variable sr of type storage resource once we've declared the variable sr of type storage resource and initialized it to a new storage resource we'll see that it's empty then we might add a string such as hello then we might add another string such as world and then we could iterate over all the strings in the storage resource by writing a for each loop and using sr.data let's see what happens when we step through this code the first line declares a variable in a box for it labeled sr and then calling new creates a new storage resource it's going to be an empty list of strings which the sr variable will refer to we'll add the string hello to the storage resources list of strings similarly sr.addworld we'll add world to the list now we're at the for each loop we're iterating over sr.data so we'll make a variable we're going to use to iterate and have it refer to the first item in the storage resource object we created by referring to the first item in the storage resource object we'll be able to print it here it refers to the first item in the storage resource the string hello inside the body of the loop we print out s so we print that then we go back to the beginning of the loop and refer to the next string in the list in this case world so we print world as we enter the loop once we've reached the end of the loop we go back to the beginning of the loop and see that there are no more strings in the list so we'll go past the loop and we're done iterating over the elements of sr if you want to learn more about other methods in storage resource or you forgot about the details of the ones we discussed here you can find the documentation for this class on DukeLearnToProgram.com finally let's see what our algorithm would look like to find all the genes and put them into a storage resource it's pretty much what you had before as we show here except we've made three changes first, at the beginning of the algorithm we make an empty storage resource to put the strings we find the genes in after making the empty resource we do something to each gene we found we add it to the storage resource finally at the end we give an answer we use the storage resource with all the genes in it rather than printing them out this method will return a value to whatever code called it so the caller can use the storage resource and the data inside it for whatever purpose they want that could be to print it or it could be to process that data further have fun with storage resource alright, so you've learned that what you would like to do now is separate finding the genes from printing the genes and the way we're going to do this is we're going to take all the genes you find and put them in a storage resource and then you can iterate over that storage resource and print them out or if you wanted to do other things with the genes and calculate properties about them whatever you could iterate over the storage resource and do those without re-duplicating all of your code to have another thing that finds them all so I've taken our code that prints all the genes which we developed in a previous video and I've made a couple small changes to it so I changed it from being called print all genes to get all genes since we're going to get them instead of print them returns a storage resource now because that's what it's going to give back to all of these things that holds all of these strings I've altered the comments here with the steps to the ones that we developed in the previous video where we came up with the algorithm and where these have been the same as before which is most of the steps I've left that code for one of these where we were previously printing it and are now adding it to a list I deleted the old code the system out print line which we don't want but otherwise this is pretty much the same and then in our test methods I've removed that code because we're going to have to do something a little bit different so let's start by turning these steps into code the first thing we want to do is create an empty storage resource and call it gene list so gene list is going to be a new variable we need to declare that its type is storage resource gene list is its name and it's going to be an empty storage resource a new one with nothing in it a new storage resource and then these steps are already translated to code until we get down here to where we want to take that gene and add it to our gene list so that's just going to be gene list dot add and that gene and then at the end here our answer is gene list as you've seen many times by now when your method or function okay so now I want to write my method that's going to test this and we're going to test it by printing things out so the behavior of our code is going to be just what we did before but it's going to be more useful more reusable we could put it to other purposes than just printing out these genes so the first thing I want to do is call this method get all genes that we just finished writing and store its result in a storage resource variable store which I'm not sure I can spell we'll just call this genes equals get all genes on that DNA now we want to iterate over the things in it so I have here the Duke Learn to program documentation because I'm not very familiar with storage resource it's not one of the standard Java classes it's kind of a nice simplification for you if I didn't remember how to iterate over all of these things I could look at this and say okay add we just use that size.data returns an iterable those are the things that we can write for each loop even give us this little example here so what we're going to do is for all of the strings all of the genes which are strings in these genes we just got back we want to print that particular gene out I'm going to hit compile whoops up here this was called current gene Java is fussing at me because it has no variable there called just gene fix that and here whoops I looked up the method and then I forgot to call it that was good easy to fix get rid of this documentation I'm going to come over here and I'm going to make a new one of these and then run it and it ran just fine and gave me the same results as before but our code is now more reusable we could put it to other purposes more easily hello and welcome to this introduction to using Java to help find trends patterns and make conclusion about information found in data you see here two screenshots of spreadsheet programs that have been used to analyze data over many many years on the left is a screenshot of a google doc spreadsheet that you can use today this program analyzes data and runs in the cloud it is accessible via a browser or mobile app on all kinds of devices all over the world on the right is a screenshot of visit calc the first spreadsheet program that launched in 1979 and ran only on apple 2 computers in general spreadsheet programs work on data formatted in rows and columns tabular data you'll be able to write Java code to analyze such data as well spreadsheets revolutionized so many industries and launched new ones by taking seconds to model what if scenarios that had previously taken days to perform the link here is to a podcast describing the development of visit calc and the industries that were transformed by these programs today data that can be analyzed by software programs is often made publicly available through government and nonprofit websites typically data is produced in CSV files files in which the different data values in each row are separated by commas thus the name comma separated values you're going to learn how to write Java programs to analyze data stored in CSV format using spreadsheet software is a great way of finding patterns information trends and visualizing data but sometimes a spreadsheet program isn't enough to solve every problem easily there are many different spreadsheet programs so a common format is useful the CSV format makes it possible for data to be portable between different types of software used to analyze data furthermore you can write your own Java programs to analyze data using the CSV format common formats often have standards for example the internet protocol or IP standard determines the format for the packets that transport information across the internet the IETF or internet engineering task force that made the IP standard also created a standard for CSV files other groups have made different but related standards for formats used in different software programs in this lesson you will learn how to use an open source software library the Apache CSV parser as you learn more about Java programming this software library will allow you to solve problems that would be really difficult to solve using just a spreadsheet let's get started with this hi the code you're about to see will show you how to access a comma separated values file named foods.csv using Java to better understand that program here are three views of the data in the foods.csv file the file itself looks like this with the first row having labels used for each column this is the header row you can see the labels for each column of data and you can see the data in each row is separated by commas here's the view using the spreadsheet program Microsoft Excel some of you may have used this program before the Google Sheets program that runs in a web browser is free software that also allows you to manipulate spreadsheets here's the view of foods.csv from that program you can see the first column has the label name the third column has the label favorite food and the third column has the label favorite color let's get coding I'm going to walk through a simple example of using the csv libraries that we have in our course so that you can understand how to create a csv parser and how to use it in the most basic ways more complex ways will be something that you learn in later parts of this course and that you can read about when you study the API so I have a simple first csv example I'm going to open up the code so that we can see a few things but rather than studying it in detail right now I'm going to run it very quickly so that we can understand how it works and then we're going to walk through each piece and make one small modification so my class is already compiled I can see that because there's nothing shaded here I'm going to right click and create a new object on the object workbench I already have one now I have two and I'm going to read the food from that that pops up a file dialog this is an example of our directory resource and I choose foods.csv a csv file and the program comes in and reads and prints Drew, Owen, Susan and Robert so I'd like to try to understand why it's printing those names and then see if we can find some more information in this csv file other than just our names looking at the source code again notice a few things first, I've imported the edu.duke libraries which is common in many of the examples we do because I'm using the csv parser I need one more import and that's a very complicated one but kind of something that you'll just be able to cut and paste over a while org.apache.commons.csv we're using an open source library for our csv parser and we've made it a little more convenient to use in a way that I'll explain one method, read food I've created a file resource object that's using our standard library because it has no parameters the file resource object will pop up a dialog and allow me to navigate to the file I want to use I just showed you using foods.csv then I ask the file resource object fr to give me the parser, get csv parser this is the new class the csv parser class that's part of my Apache library that you can see highlighted on the screen I now loop over the iterable that is the parser getting a csv record each time so I have two new classes here the csv parser class and the csv record class the csv record class has one method that I'm using, get and that allows me to get one of the records on the line of that csv file as you may remember csv files consist of several elements of data separated by commas one of the elements is named name in this case since I've studied the csv file I know that one of the other elements is named favorite food so if I ask the record to get me the field the element that's favorite food that will do it I'm going to print this one with just a space after so notice I've changed the print lin to just print that stays on the same line printlin will finish the line I'll show you how that output works my class is compiled, no errors I'm going to create a new object on the workbench by right clicking which is what I normally do here so I'll select right click create a new one and then on the object workbench I'm going to read navigate to the foods.csv file and notice now I have Drew's favorite food is chocolate Owen it's favorite food is pineapple Susan really likes cake and Robert likes pizza one more example it turns out that in addition to favorite foods in this csv file there's also a favorite color so I'm going to print the favorite color we'll look at those briefly and then we'll review finally so favorite color and I should put a space at the end of that line so it's easier to read I'll compile that when I create the new object it appears on my object workbench I'll run that navigate to the foods.csv file and lo and behold Drew's favorite color is green surprisingly Robert's favorite color is green too if you're a very stewed observer of the courses what we've done so far that might make sense to you Susan loves purple and I like blue that is running through let me review one more time in this csv file that this program read there are three fields name, favorite color and favorite food if you tried to get another field so for example I decided to say that my favorite number was get favorite number get favorite number that will compile and when I try to run this example by making a new object and running by right clicking sometimes I can't right click too well openfoods.csv and I've got all kinds of illegal argument except for favorite number favorite number not found my csv file does not have a favorite number field and so I could not open that when we study this in more detail and when you read the api you'll see that there are ways that you can avoid trying to access csv elements that don't exist but for right now we've seen used a library appropriately that's org.apache.com and csv get the csv parser from a file resource object and then loop over the parser which isn't iterable to get records one at a time have fun with csv and finding information from data welcome back now that you know a little bit about the basics of working with csv files it is time for you to solve a problem using them here we have a csv file which shows export data for 218 countries the columns are the country name the main exports and the total value of their exports in 2014 in US dollars a problem you might want to solve with this data is to find which countries export a particular item for example you might want to print all of the countries which export lobsters or all of the countries which export iron there's a lot of data in this file way too much to look at by hand which would be tedious and error prone instead you would rather write a program to perform this analysis for you as always we will walk through how to how you write such a program using the seven steps step one is to work a small instance of the problem by hand here we have cut the table down to four countries and only listed a few exports for each of them which of these countries exports coffee after looking at the table for a few seconds you can see that Madagascar and Malawi export coffee and Macedonia and Malaysia do not next you need to write down what you just did if you just say I just looked at it and saw the answer that will not help for step three instead you need to think through what you did in a more step-by-step fashion how did you look at the table and see the answer you probably looked at the first rows export column and saw that it did not contain coffee next you looked at the second rows export column and saw that it contains coffee so you wrote down Madagascar next you looked at the third rows export column and saw that it also contains coffee so you wrote down Malawi finally you looked at the fourth rows exports column and saw that it did not contain coffee thinking through this in a step-by-step fashion leads to these 10 steps for this particular instance of the problem now we are ready to look for patterns and generalize notice how you are doing very similar things for each row this similarity suggests you will eventually want to loop over each row however there are some differences in the steps for each row that you need to think through before you can express this algorithm in terms of for each row the first difference you might notice is that on line five you wrote down Madagascar and on line eight you wrote down Malawi where did these particular words come from they are the value of the country column for the row you are looking at the next difference you might notice is that you do not write down the country for the first and fourth rows but you do write it down for the second and third how did you know whether to write the country down or not you made this choice based on whether or not the row contained coffee in the exports column now that you have figured out how to operate on each row in the same way you can write down the generalized steps in terms of for each row in the CSV file using over how many rows the file has now that you have a general algorithm to solve this problem it is time to test it out here is a table of data which contains two countries use the algorithm to find out which one's export lobster is this algorithm correct according to our algorithm Brazil is the country which exports lobster oops that's not correct we should have gotten anguilla did you test the algorithm carefully enough to notice this mistake the problem is that our algorithm always checks for coffee not for whatever export we are interested in we should have thought this through as we generalized but we missed doing that this is exactly the sort of problem step four testing your algorithm is intended to catch before you write code you can fix this algorithm by referring to the parameter we'll call it export of interest which indicates which export you want to look for making this the algorithm will have it yield the right answer on this test case now that you have an algorithm it's time to write the code now that we've developed our algorithm to figure out which countries export a particular item it's time to turn that algorithm into code I'm here in blue J where we've created a class and imported edu dot duke dot star which you've used for many things and org dot Apache commons dot csv dot star which you've learned recently is what you need for the csv parser class I've written a method here list exporters which takes in a csv parser so that's going to be the csv parser that's already open for the file you want and a string for the export of interest and I've written down comments here for the algorithm we just devised so the first step in our algorithm which has a slight typo in it said for each row in the csv file we've already learned that that's for csv record is the type of item that you iterate over in the csv file and we'll call it record from the parser and I'm going to put curly braces around all of the steps that go inside of there and then we said look at the exports column you learned before I'm just going to call that exports this is record dot get and the name of the column you want in this case exports check if it contains export of interest so now we need to check and see if this string export contains export of interest one way we could do that which you've seen before is export dot index of export of interest is not equal to negative one now if you were to look in the Java string API you could see that there's a little bit more readable and convenient way to do this with a method that just is called contains these two will do the same thing it's just a little more clear to someone reading the code what you're trying to do if you write contains instead of index of equals negative one I'm just going to put that back where it goes and then we said if so write down the country from that row so I need to first get the country string country equals and again I'm going to use record dot get country to get the country column from that row and then if I want to write something down I would print it out okay and then my curly braces all match up I'm just going to hit compile it says it compiled with no syntax errors now as with other things making a csv parser in the blue j object workbench is a little complicated so we're just going to make a method here to help us test this which is going to be who exports coffee it's going to take no parameters and it's going to tell us which countries export coffee from a particular data set so I'm going to make a file resource which you've seen many times before and as you may recall if I create a file resource without passing in any parameters it will pop up a dialogue to let me pick what I want to choose it from and then I'm going to want to get the csv parser out of that that file resource has a way to give me a csv parser which will parse that file data and then I want to list exporters from that parser for coffee and I'm going to compile this again it tells me invalid method declaration return type required I forgot to tell it what type it returns in this case I don't want to return anything since I'm just printing things out so I'm just going to write void there and then I'm going to try to compile again that compiled just fine I'm going to come over here and I'm going to make a new one of my objects and I'm going to say who exports coffee and then I have this export data.csv file which is going to list a whole bunch of countries now we're trying to test out our code we don't know if this is correct how would we know this well we'd have to go through that entire csv file and check did we get everything that exports coffee that would be really tedious and is the whole point of writing the program in the first place what we should and can do instead is write a much smaller file here's the one that we used in the slides and we can test our code on that instead so if I come back here and I do who exports on this is export small dot csv it says Madagascar and Malawi which we know is the right answer so that gives us more confidence in our code and we can be more certain that we got the correct answer on the larger file so there's the code for that and now you can do similar things yourself okay so now you know how to solve the problem analyzing csv data to find out which countries export a particular good in this lesson you learn the mechanics of using Apache's commons csv package a library for working with csv data using the csv parser and csv record classes to operate on the data in csv files you devise an algorithm to analyze the csv file and find out all the rows that meet a particular criteria in this case which ones export a specific good but the same basic ideas would work for other criteria more generally and of course you implemented it in Java putting your newly gained knowledge of the csv library to use welcome back now that you know a bit about working with csv files it is time to learn how to analyze numerical data in those files for example here we have a csv data file about the weather at the Raleigh Durham airport for January 1st 2014 we have this data for every day in a couple years with one csv file per day each row contains information about one hour of weather and there are a variety of columns such as the temperature in Fahrenheit the dew point the humidity etc if you are studying weather patterns you might want to analyze this data and ask a variety of questions of course the techniques you will learn are applicable to other types of data too so you may find them useful in a variety of other fields as well one question you might want to ask is what is the maximum temperature that is when is it hottest if you are doing this just for the data on one particular day you could just look there are only 24 entries or use the max function in a spreadsheet however what would you do if you wanted to find the maximum temperature over many days such as for an entire year you certainly do not want to look through all of the data by hand and importing 365 files into your spreadsheet would be quite tedious for something like this you would want to write a program to do the work for you in this lesson that is exactly the problem you will solve finding the hottest day of the year for the purposes of this example we are going to say that the hottest day of the year is the one with the highest maximum temperature a related but slightly different problem is to find the day with the highest average temperature we are not going to walk through that problem but you could certainly do it after this lesson the plan to write this program is to start by learning about dealing with numerical data the CSV parser will read the data as strings which have to be converted to numbers once you know how to convert strings to numbers we will start with a smaller piece of the problem just finding the maximum temperature on one day we will walk through the algorithm and the code development with you you will want to test your code to be confident that it is correct before proceeding once you are confident in your code to find the maximum temperature on one day you will want to build on it and find the maximum temperature for many days this will let you find the maximum temperature in a year so let's get started welcome back as you work with data in CSV files you may encounter numerical data such as this here we have a variety of numbers such as years however the libraries you use to read a CSV file will read this data in as a string the sequence of characters 1493 not the integer 1493 if you want to manipulate this data numerically adding values finding the largest other task you will want to work with the data as an int or a double so what would happen if you tried to write a line of code like this int value equals row.get year when you try to compile it you would get an error message here that the types are incompatible you are trying to sign a string to an int let us look a bit more at this error if you look at the documentation for the CSV reader which we show here you will see information about the get method in particular you can see that its return type is string because that method returns a string this entire expression row.get year has type string Java knows that it will be a sequence of characters however that is not the same type as int which is the type we declared value to be this raises two questions we will answer for you first, why is this a problem sometimes Java can automatically convert between two types so why can it convert a string to an int? it seems like if we have the string 1493 Java could just figure out that that should be the number 1493 however it cannot do this in this case the second question is of course how do you fix it to see why Java cannot automatically convert a string to an int consider what would happen if the string did not have numerical digits in it for example suppose our CSV reader read the string hello instead what would you expect Java to do for this line of code? the rules of Java require the compiler to reject this line of code because there is no guarantee that you can meaningfully turn any string into a number another consideration is that the string 1493 has a very different representation from the number 1493 even though everything is a number the type of a piece of data describes how that number is represented and interpreted we will not delve into the details of the representational differences here but it takes some work to convert between the two accordingly we would have to execute some code to perform the conversion algorithm such an algorithm would examine the digits in the string and compute the integer that they represent we are not going to write this algorithm for ourselves because it's already built into Java you can just call integer.parcent to convert a string to a number even though it's built in you still have to call it explicitly to tell Java that that is what you want to do here you can see an example of how you can use the integer.parcent method you pass in the string you want to convert and it returns the integer value if you need to work with numbers that have fractional parts you can use double.parcent which will accept a string like 42.56 and convert it into the corresponding value of type double the type to use to work with non-integer numbers for either of these methods the string could be invalid it could be something that does not represent a number for example if you pass hello to integer.parcent the method cannot turn hello into an integer so what happens the method throws an exception which indicates that an error has occurred if your program throws an exception then it will crash there are ways to make your program handle the exceptions specifying what to do instead of crashing but we are not going to go into that more advanced topic here so now you know that you cannot just assign a string to an int and that if you need to convert a string to an int you should use integer.parcent likewise for real numbers you have learned to use double.parcent double now you are ready to manipulate CSV data with numbers in it thank you welcome back the first piece of solving the problem of finding the highest temperature in a year which has its data spread across hundreds of files is to find the maximum temperature in one day which always has one file as always we are going to start by working a small example by hand in a step by step fashion here we have six rows of data to work with the first thing you might do is look at the first row and in particular at its temperature which is 30 degrees fahrenheit that is the maximum we have seen so far which you will want to keep track of if you were just working this out with a small data set you might just remember it in your head but we will draw a red box around it to be explicit you might then go through the rest of the data in a similar fashion looking at each row and deciding whether it is the largest you have seen so far or not after you look through all the rows you have found your answer which in this data file is the last row ok now that we have worked an instance of the problem by hand it is time to write down exactly what we did in a step by step fashion the first thing we did was look at the first row in particular its temperature F column next we noted that it has the largest temperature so far the next step was to look at the second row its temperature is not greater than the largest temperature we have noted so far then we looked at the third row its temperature is 30.9 which is larger than the largest we have seen so far so we updated our largest so far to be the third row for the fourth row we did very similar steps saw that 32 was larger than the largest so far and updated our largest so far the fifth row was not larger than the largest so far and the sixth row was larger than the largest so far so we updated our note of what was the largest that was the last row so we are ready to give the answer in this case the sixth row was our answer now that we have all of those steps written down for this particular instance of the problem we are ready to find patterns and generalize the first thing you might notice is that you are doing similar but not quite the same things for each row of the cs file as you can probably guess by now you will eventually write code which loops over the rows to solve this problem however before you can do that you need to think about the differences and find ways to make them the same the first difference you might notice is that for the first row we just noted that it was the largest so far but for later rows we compared the row to what we had previously noted down as our largest so far the first row is a bit unusual here because we have nothing else to compare it to we did something implicit that we did not write down we check if our largest so far was nothing or something first we will need to incorporate that into our generalized steps the other difference you might notice is that sometimes we updated what we recorded as the largest so far while other times we did not we marked the first rows update in purple as we just discussed how it is different from the others and marked the steps in red where we did not update the largest so far and in green where we did what is the pattern it is when the current rows temperature is higher than the largest so far's temperature thinking through these patterns leads us to the following thoughts on how to decide when to update the largest so far if the largest so far is nothing meaning we haven't we don't have one yet then the current row is the largest so far otherwise if the row's temperature is greater than the largest so far's temperature then the current row is the largest so far after thinking through that we can express our algorithms in terms of for each row in the CSV file for each row which we will call current row you will want to decide how to update the largest so far variable as we have just discussed we have not said anything about what largest so far starts as so we should be sure to put that in here we mentally glossed over this while we were writing down our steps but we implicitly started with it as nothing before we began looking at each row we should write that down in our algorithm here the last step which we did write down was to give our sixth row as our answer after we finished looking at each row is the answer always going to be the sixth row? no it is always going to be the largest so far the row that we have been keeping track of as we worked through the data we should test this out before we try to write our code try it out on these four rows of data does the algorithm give the right answer? yes it does we are now more confident that we wrote our algorithm correctly so we are ready to turn it into code in writing the algorithm to find the maximum temperature in a CSV file we wrote down steps that do not correspond to anything you know yet in Java the idea of having nothing how do you turn that into code? this introduces an important new concept in Java null in Java and many other programming languages null means nothing or no object this concept is very important as it is common for algorithms to need to refer to the value no such thing one thing you can do with null is initialize a variable to it for example CSV record largest so far equals null means initialize largest so far to be no such thing another use of null is when an algorithm has an answer of it doesn't exist or there's no such thing in that case returning null from a method is an appropriate way to indicate that no such answer exists you can also check if an expression is equal to null which is quite useful in many algorithms in the algorithm we are working on you want to check if largest so far is nothing that is if it equals equals null one thing you cannot do with something that is null is call a method on it since null means no such object it does not make sense to try to call a method on no such thing for example this code is problematic even though it compiles just fine this second line will cause the program to crash when you run it you would get an error message like this which tells you what went wrong the first thing about this error message is that it says exception which generally means that something went wrong with your program the last part tells you what kind of problem you had java lang null pointer exception in this case means you tried to do something with null that needed an actual object in this case trying to call a method on it while we are discussing null remember that all expressions have types you have previously learned that it is important to know the types of expressions as you write and think about code this raises an important question what type of thing is null we wrote csv record largest so far equals null and told you that was legal so it might seem like the type of null is csv record but does that really make sense would java be designed such that the type of nothing is a csv record shouldn't we be able to have nothing for other types too java actually has a special null type unlike other types you cannot write down the name of this type in your program you cannot declare variables of this null type nor can you make methods who return type is this special type the literal null is a special type and this type can be converted into any object type that is java will let you assign it to variables of any object type return it from methods whose type is an object type or compare it to any other type of object you may have noticed that we just said any object type not just any type java has two categories of types primitives and objects primitive types cannot be null you have seen four primitive types so far int, double char and boolean there are also four others you have not seen byte short long and float these types are all built into java and they are just plain data they do not have any methods associated with them and they cannot be null the other category is object type which can be null you have seen many object types so far file resource string csv record and pixel just to name a few in general anything with a method is an object type likewise any class you write is an object type there are some other differences between primitives and objects but they are not relevant yet so you will learn about them later now that we have developed the algorithm for finding the hottest hour we are going to write the code so I have already set up the class for us with the imports that we are used to having and the class name and the method that we are going to write is this hottest hour in file we are going to start the process with the pseudocode that we have developed by the end of the last video so at first it says start with the largest so far as nothing so we now know that nothing in java is represented as null so I am going to go ahead and set largest so far I am going to give it an initial value of null so now it starts out with null so it is initial value and it is a CSV record which is what we are planning to return and then I am going to come down it says for each row in the CSV file and that again is our familiar pattern of iteration that we have been using throughout this course and we are going to use the parser that has been passed in as the parameter as the CSV information that we are looping over in the iteration and I am going to go down to the bottom here and go ahead and close this loop so that again I sort of have the idea of what I am going to do at each for each record as we iterate over it next it says if the largest so far is nothing so this is the first time that we have seen it and we are going to check to see if largest so far is null and again this is just like we saw in the last video we are checking to see if the value was null if it is we are going to assume that the current temperature, the current record is the hottest one so we will call that current row or we will assign that largest so far gets current row that is all we have to do for that if statement we are just replacing it otherwise and that means else in Java I am going to compare the two temperature values directly that means I need to get those temperature values out of the record I know that those values are going to be numbers but I have a decision to make should they be integers whole numbers or should they be doubles real numbers that is right they should be doubles because doubles because one of the temperatures was 30.9 in our example and that is a real value number which is better represented as a double so I am going to set current temperature equal to the current row that get of temperature f but getting that value is going to give me a string and what I really need is a number so I need to convert that string from a string into a double so I will use that I will use double.parseDouble to make that happen then because I need to do the exact same thing again I am just going to go ahead and paste that line and I am going to call this largest temp and I am going to get it from largest so far and now I have two double values and I can compare them directly I am going to check if the current temp is greater than the largest temp and if it is then I am going to replace that value the one in largest so I am going to say largest so far current row so I have replaced it in the two cases we had checks for and I am done with my pseudo code for inside the loops and I am going to move to the next line which is outside the loop and it said that largest so far is the answer so I am going to return largest so far now I am going to compile my code just to make sure I haven't made any silly errors it says no syntax errors so now I am ready to test let's take a quick look at how the data is organized on the file system and what an actual file looks like so we have a data folder and inside of that data folder it is organized into years and then within each year we have a weather file for each day so there is 365 or 6 depending on sleep year of those files for each of the years so that is how it looks on the file system and then here is how an individual file looks so in this case I have chosen January 1st 2015 as the one that I am going to test the data file that I am going to test and you can see we have got the column information up top and we have got the column data for each of those and we have two files loaded up in a spreadsheet because CSV files are easily represented as spreadsheets so this is a nice way to visualize them the other method that I have already included in the max class is test hottest in day so we can test our code to make sure that it is right the first line in there creates a file resource already set to January 1st 2015 the second line calls our hottest hour in file method that we just wrote passing it the parser that we created for that particular data set and then that returns us the CSV record that is the largest one and then we print out that hottest temperature in the time that it appeared at just to see when that was and that is going to give us a sense of whether or not a program is correct so I am going to compile our code and I am going to come over to the BlueJ environment and create a new CSV max object and then call my test hottest day and it says the hottest temperature was 51.1 at 251pm and if I go over to my data file and I look at the temperatures I see that they are in the 20s and the 30s 50, 51.1 back down to 50 back to the 40s and 42.1 so indeed the hottest temperature was 51.1 occurring at 251pm let's also check the 2nd of January just to try it on a second data file to see how we are doing so I am going to come over and change that from January 1st to January 2nd I am going to recompile I am going to go ahead and create a new CSV max and call test hottest day this time it thinks it was 54 at 12.51pm and again if I come over and I look at my temperature I am in the 40s I am now in the 50s, 51.1 I have 54s and then back down to the 40s you will notice that there is actually 4 times when it was recorded as being 54 degrees and we have recorded the first time that that occurred 12.51pm as opposed to the last time that occurred, 351pm so now that I have tested it on 2 separate files and I have seen it worked for those, I feel reasonably confident that my code is correct but I encourage you to test it on more files and try it out on your own and see what you get now that we can find the hottest temperature on a single day let's look over a range of days and find the hottest temperature over that range for this I have created a new method called hottest in many days that we will use to do this calculation and we will be using a directory resource for this which will allow us to select any number of files over at once to compare as we have done for many examples we will create a file and that will be what the directory resource returns when we say selected files and we will iterate over that and now that we have that file we will use that to create a new file resource and now that we have that file resource we can actually use the, we can call the hottest hour and file method that we created earlier with the CSV Partizer just like we did in our test actually and so this code right here that we have already got in our test is exactly what we are going to do but now we are going to do it inside of a loop so that we can do it as many times as needed but instead of this one being the largest this one is only going to be the current temperature and we are going to have to compare that against the largest so far so just like in the previous example we are going to have to keep track of what we think is the largest one so far so to do that we will create a CSV record largest so far I will even use the same name and initially it will be set to null and then once I get the I am looking through the loop here I am going to check to see if the largest so far is empty I mean we haven't assigned it yet then I will go ahead and reassign it to the current one that we just got otherwise I am going to have to compare the two of them again and again since this code is going to be very very similar I am actually going to go ahead and literally copy and paste it from my previous implementation and I am going to get the current temp I like current row I am going to call this one current row as well again so that we have a similarity of names so I am going to get the current temperature and save that as a double current temp I am going to get the largest temperature out of largest so far I am going to compare them if it is greater I am going to go ahead and replace it and then I am going to close my loop and so now I have iterated over many days the main difference is as I have called hottest hour in file instead of calling it for a particular row and once again largest so far is going to be my answer so I am going to return that at the end I am going to compile my code just to make sure that I didn't make any silly mistakes and it compiles so now let's go on to testing for testing once again I have created a test method test hottest in many days in this case I call the hottest in many days method it takes no arguments so there is nothing to pass but it returns a CSV record just for me to use. I have printed out that the hottest temperature was and I have also printed out what day it was I have changed the field that I get to date UTC from time EST since it could be any day now I want some extra information I want to know what day it actually occurred on so I am going to go ahead and compile this and come over to the BlueJay environment I am going to create a new CSV project I am going to call the test hottest in many days and I am going to go ahead and choose the first two days from 2015 just to test because I have tried those two times before so I know that the hottest day for January 1st was 51.1 and the hottest for January 2nd was 54 so I would expect that the answer is 54 on January 2nd and sure enough that's what my answer is now that I have some confidence that it's working I am going to try it on a bigger data set and so now I am going to test for many days and I am going to see what it was like for the whole of the year 2014 which is the last year for which we have complete data and now I am going to go ahead and say open and apparently that was 98.1 occurring on July 8th at 10.51pm so now I feel somewhat confident that my code is working I tried it on a small example just two days and then I scaled it up to try it on an entire year something that would be very difficult for me to test by myself but that smaller data set gave me some confidence that my larger data set was working correctly technically my code is done but I feel a little uncomfortable about the amount of duplicated code that occurred in both of the methods that I wrote and I know that it was duplicated because I copied and pasted between the two methods and in programming that's not a good way to create your programs because if you have a problem in that piece of code it now appears multiple times and you have to try to fix it in all those different places so what I'd like to do is I'd like to factor out that common code and put it in its own method which here I've named get largest of two so that I can reuse that code in all of my other methods so the part that I copied and pasted was this if else statement that appears both in the hottest of many days and in the hottest hour in a file so I'm going to go ahead and cut that code from that method and put it into get largest of two including at the end returning the largest so far so that I have the right result and I have I don't have to make any changes to the code because in my particular case I had everything named the same so both of them call the current one current row and the largest one largest so far and largest so far should remain the same unless it's null or unless current temp is greater so in those two cases I should update largest so far but otherwise largest so far is the correct value so that's what my method is going to look like it's just going to be those two the if and the else statement and a return of its own and now I can use that here to say largest so far gets let's get largest of two and I'll pass it the current row and largest so far that will do the work of my if l statement store the result there and likewise I can now copy and paste this one line of code into hottest of many files and replace again that same if else with that one line so I get the current row from the file and I get the largest of the two and now let's go ahead and compile and make sure that I didn't misspell something by not capitalizing it and then since I copied and pasted that I have to fix it twice which is exactly what I was warning you about earlier and so now hopefully since I fixed both of those it will compile and now we have to go back and test it just to make sure that everything is correct so I'm going to create a new csvmax because even though I moved code around and it shouldn't have changed the functionality I want to actually make sure that that's true so I'm going to test the hottest of the day and sure enough it gives 54 for January 2nd of 2015 and I'm going to call get hottest in many days and then I'm going to go back and select those two days that we've been using for our tests and sure enough it still thinks it was 54 degrees so with those tests I feel good that the changes that I made in terms of moving code around didn't affect the functionality but now I feel more confidence in my code because this large piece of code that was duplicated now I've taken care of and it only appears in one place so if I ever find a problem with that I can go back and fix it just in that one place okay so now you know how to solve the problem of finding the row with the maximum temperature in many csv files in this lesson you learned how to convert strings to numbers with either integer.parcent or double.parstouble you saw another example of breaking a larger problem into smaller problems as you first solved finding the maximum in one file then built on that solution to find the maximum in many files you learned how to devise and implement an algorithm to find the maximum element in a set of data and you also learned about the concept of null which is how you represent nothing or no such object in java hi everyone I'm Susan roger and today we are going to make java cookies so we're going to start we're going to start by putting in two sticks of butter and that is a cup of butter next up we're going to add in a cup and a half of sugar now we're going to cream the butter and the sugar together so I just kind of use the back of the spoon and press it down a lot alright we've got the butter and the sugar creamed together really well now next we're going to add two large eggs I'm just going to crack them and I like to crack them into a cup first we'll just pour them in there and you stir it a lot until consistency like this one teaspoon of vanilla extract so you stir that up a lot you could use a mixer if you want but I just like using a spoon next we're going to add three cups of all purpose flour there's one there's two and there's the third cup of flour and a half a teaspoon of salt and a half teaspoon of baking powder at some point you just have to get your fingers into it and squeeze it together with your hand and then when it's forming big chunks of dough you can make three dough balls like this here's the second one and here's the third one then I get some plastic wrap and then I wrap the three balls and I'm going to freeze them for about an hour I need the dough to be a little bit harder in order to roll the cookies out now we're going to bake the cookies so I have printed out the Java logo and then I cut it out we have the dough here and I'm going to put down some flour and then I'm going to also put a little bit of flour on my rolling pin you might want to cut the dough in half it might be easier to work with I'm going to pick up the dough and I'm going to put it on the tray and then I take my cut out Java logo and I put it there and I get a knife and I'm just going to cut it out here we go a 3D cookie cutter would be a lot easier but I used to make all my cookies this way before we had 3D cookie cutters and then I just pick up the dough now I'm going to roll out more dough and make another cookie that's 2 cookies and that's 5 cookies so we've got 5 Java cookies and we're going to put them in the oven at 350 350 degrees and we're going to bake the cookies for about 15 minutes the cookies are ready let's get them out of the oven oh they smell really good mmm there's one and there are 5 Java cookies we'll let them cool and then we're going to decorate them now we're going to decorate the cookies we want the cookies to look like this I'm going to use white icing just get a knife and I'm going to stir it up I'm going to put a little bit of the white in here and I'll use that to make red icing and then I'm going to also put a little bit in this bowl to make blue icing so now I'm going to make some red icing and I use the Wilton icing colors they're gels I'm going to add a lot more red you have to use a lot of red now that's a good red color I'm going to make blue so we'll use royal blue and that's a pretty color blue now with the red and the blue icing we're going to put them in a bag so that we can squeeze them out now if you don't have one of these what you can do is you can get a plastic baggie put your icing in there and then just cut off the corner when you're ready to squeeze it out and then I use a cup here and then here's our blue icing and the blue icing is ready to go and now we'll do the red icing and the red icing is ready to go now we're ready to ice the cookies and we'll just get some white icing on it and we'll just spread it out and we've got one cookie ready to go and that's six cookies now I'm going to add the blue icing and I put the picture in front of me so I can kind of see what I'm doing and we'll just put it on there there's one alright we're done with the blue now so let's add the red and we are now done with decorating the cookies and I hope you have fun making your own java cookies and if you do please send me a picture it's been a couple of hours there's our java cookie and the best part is getting to eat the cookie these cookies are really good I hope you enjoyed the video the end the cookies are all gone hello you're going to learn the background and coding needed for the mini project for this course you're going to write programs to answer questions that are very hard to answer using a spreadsheet but you'll be able to approach in a straightforward way using the practices skills and libraries you've learned about in this course you're going to answer the question what's your name this year your name in the year you were born in other words what name this year has the same popularity rank as your name did the year you were born you can answer this question for anyone's name so you can use a friend's name a singer you like or anyone these pictures we will be using to describe what you're going to be doing are taken from a website that offers something similar to what you'll be doing in this mini project suppose you are a female named Jennifer born in 1994 Jennifer was the 21st most popular girls name in 1994 so one task you'll need to do is write code to figure out a names ranking for a given year if you were Jennifer in 1994 what would your name be today you'll discover via coding that grace is the 21st most popular name today where today is 2014 the most recent year for which we have data on names given to babies in the United States so today your name would be grace if you were born in 1994 and named Jennifer but you can also see what your name would be in any given year here we see a summary showing your name in several different decades Jennifer in 1994 would be Barbara in the 1970's that means Barbara was the 21st most popular name in the decade of the 1970's when taken together because we have data for the United States going back to the 1880's more than 130 years ago we can determine your name going back a long time here we see that your name in the 1900's would be Sarah if you were Jennifer in 1994 you'll need to write programs to make conclusions about the names that we've outlined here turning all that data into information the US government releases baby name data every year we've collected that data and made it available to you as part of this course we would be excited to have your help to collect similar data for other countries in the world we have data for males and females going back many years with a different data file for every year the files share a naming convention which is convenient when writing programs to open the files and read them you'll leverage the common naming convention in writing code to access these hundreds of data files the file contents are also similarly formatted that will help you write more general code to solve these problems we'll look briefly at the data file for 2014 the most recent data provided to you for use the line numbers in the file are ordered by the number of babies with a given name so that the most popular name baby name comes first then the second most popular and so on and so forth 20,799 babies were named Emma in 2014 making it the most popular female baby name that year all the female names precede the male names in the data file this means the most popular boy's name which was Noah in 2014 with 19,144 boys named Noah comes just after the girl names that the fewest girls have which were Zeriona in 2014 we'll walk through the high level concepts for accessing data in one file but you'll need to go through the full seven steps to solve the problem we're asking you to address in this mini project you'll use several classes from the edu.duke and org.apache.com in developing your solution to this problem for example you'll need a file resource object to access the data in a file for a particular year you'll need a CSV parser method from the file resource class making sure that you ask for a parser with no header row because there is no header row you'll need to access the data in each record by indexing the first data element with index 0 is the baby's name in the file the gender of the baby is the second data element once you've got this you'll be ready to start thinking about the problem and using our seven step process to solve the entire problem. Have fun! Now that you've learned about the problem that you're going to be solving for your mini project let's take a look at the data files and some code for working with those data files to get us started in working through the problem for this first example I'm just going to go ahead and print out some basic information about what's in the data file so that we can get to know it and make sure that we understand what's going on so I'm first going to make a file resource and I'm going to choose which file I want to go ahead and open and then I'm going to create a CSV record and I'm going to iterate over all of those records in the file by using the CSV parser that we've seen several times but in a new twist, as we saw in our video we're going to put the data value false when we create the CSV parser and that means that this CSV file does not have a header row in other words the very first line of the file is actual data that we're going to be using instead of the header row for each record that we iterate over right now we're just going to print out the basic information about it in a nice format so that we can kind of read it and make sure so I put spaces in between the names and again which is different than we've seen before we're going to be accessing our information using numbers instead of names and that's because again we don't have a header row so that we're doing it by value where zero is the first one and one is the second one and two is going to be our third one which is all of the fields that we're going to have in our data so that's it for our first version of this program we're going to compile it and run it on a data file that I've created that is meant to be a very simple example just to get us started and you can see that Emma is the name right here Emma's gender is female and there were 500 girls named Emma born in this example the second one Olivia, female 400 born all the way down to Eva who was the last ranked girl with 100 born Noah is the first male born with 100 born so he was the most babies born so he's ranked number one in the in terms of male births even though he appears as the sixth name in the file so it's all females first and then all males following that according to their rank so one, two, three, four, five in our example file one, two, three, four, five in our example file so we have five girls and five boys you can see the actual data file that I've created here in this spreadsheet where I've loaded the data and again there are five version or five females and five males and this is what the data file would look like on a small scale so we have some numbers we can test with here's an actual data file where this is for the births in the year 2014 and you can see that Emma again is the most popular but with 20,799 births instead of the 500 that I made in this particular example I've gone ahead and calculated the total, the totals just to give us a sense of how large this file is and you can see from the total names there are 33,044 different names in this file that means there are 33,044 different lines in this file because each name occurs on a separate line of those 19,067 are girls names so that means the first 19,067 lines in the file are the girls names and then the 19,068th name in the file is Noah who is the highest ranked boy for this year and he would be first of 13,977 boys names that appear throughout the rest of the file so we're going to be working for our examples with this small file right here just so we can get a handle on it because 19,033 are big numbers that I can't really work with and test very easily but we're going to use these for our testing purposes so just again as a simple little thing to try we're going to go ahead and only print out the names if the number born is under a certain value so I'm going to go ahead and create an integer from the string that we get back for the number born and so that's going to be the second piece of information and then I'm going to check if that number born is less than or equal to say 100 and then print out those names only those names that are smaller than 100 and and so let's see what those are going to be so now when I compile and run it with that same small data I see that I get Ava who was the last ranked girl and all of the boys names that were in there because all of the boys names had number born less than 100 now that we've seen the basics of working with these files let's go ahead and solve a real problem calculating the total number of babies born total number of boys and total number of girls in this case I've created a new method total births to for us to work in in this case I'm taking a file resource instead of having us choose that'll make it easier for us to test and work with later but I'm still going to use the same basic idea that I used in the previous one of looping over the previous problem over all of the CSV records in the file and I'm still going to pass false because it has no header row so now what we're going to do is we're going to check out the number that were born each time but we want to add that to a running total of how many were born so we're going to create a variable called total births and we're going to add to that the number born at this iteration of the loop problem with writing this piece of code here is that we can't both declare a variable and add on to it for this iteration of the loop and the same line so we have to declare the variable someplace else and the right place to do that is at the top of the method before the loop that we've ever started with an initial value of zero meaning we haven't seen any births and then at each iteration through the loop we calculate the total number that was born or add to the total birth the number that were born for that particular name and then at the end we're going to print out the total number of births to see and verify whether or not we got the right thing and run our simple little function just to see if it's going to work now in this particular case since we've been passed a file resource we need to have a test method that allows us to choose which resource we're going to be using we're not going to have a dialog box pop up this time because we already know what we want to work with we've got that nice small example file that we're going to work with so I've already put that information in there to make sure that we're going to be able to see our output since our code compiled I'm going to go ahead and create a new instance and call test total births and I get that the total births were 1700 and again if I look back at our example file I'm adding 500 plus 400 plus 300 plus 200 plus 100 30, 30, 20 and 10 I've added all these numbers up I don't trust myself to do math on the fly so I've created in this spreadsheet I've created a formula that will do the calculation for me and it came up with the number 1700 as well so I feel pretty confident that this basic piece of code is doing what I want let's revisit this code and look at the total for total number of boys born in order to do that we're going to have to divide up the children based on their gender so the way that I'm going to do that is I'm going to check and see if the gender is a certain one so I'm going to say if the current one dot get of one because again the second field is what represents the gender and if that equals male say for example then I want to add to a total for the for the total boys so equals the number born and otherwise I want to add one to the total girls instead obviously I need to create those variables and again on the assumption that only whole number whole numbers of babies were born each time I'm going to go ahead and make those ints and I'm going to initialize all of the variables up here and then again I'm going to add a print statement down here to make sure that we're checking our information I'm going to compile quickly to check that I didn't make any silly mistakes and try testing this again and now I get seventeen hundred fifteen hundred and two hundred seventeen hundred fifteen hundred and two hundred so I think again that I feel somewhat confident that it seems to be working but just to make sure I'm going to try it on a larger data file so instead of example small I'm going to try it on the year twenty fourteen and try running that one and in that case I get three million six hundred and seventy thousand one hundred and fifty one one million seven hundred and sixty eight thousand seven hundred and seventy five one million nine hundred one thousand three hundred and seventy six I've done the same trick in my spreadsheet over here with letting it do the sums and I see that I get the same numbers for total burst girls burst and boys burst as what I calculated in my program so again I feel relatively confident that my solution is correct in this code an interesting question is does it matter how the boys and girls names are organized in the file that's right it doesn't for this code for this code since we check the gender each time I could have the boys names first and then the girls or I could even have the names interleaved but in your files all the girls names are going to be first and all the boys names are going to be second and that's going to determine how you figure out what their ranking is since we weren't worried about ranking for this particular program we didn't have to worry about that distinction but you will in your code good luck with the mini project welcome back let's look at the problem of converting images to grayscale this is an example of solving a real problem by writing two programs and combining them into a single program why might you want to convert an image to grayscale there are several reasons you may want to see how images would look if you print them in grayscale grayscale printing is much cheaper than color printing and some publications require that all images be converted to grayscale or you might be planning to do some other more complex image processing that processing could be simplified or even sped up by working with grayscale images if you just need to convert one image to grayscale the easiest thing might be to use an image editing program you already know you'd open the image the one you want to convert and then you'd use the program to create a grayscale copy but what if you need to convert many images to grayscale it can be quite tedious and time consuming to open each image up transform it to grayscale and then save it for a few images this may not be a big problem but what if you need to operate on a thousand images it would take days to do this by hand if you could even make yourself complete this repetitive task over and over and over instead you could write a program to convert many images to grayscale in particular you might ask the user to select some group of images to convert perform the grayscale conversion on each of the selected images and then save the results using file names which were similar in our example you'll add a gray dash prefix to the beginning of each image file name to distinguish the new grayscale copy from the original this is exactly what we're going to work with you on in this lesson in particular we're going to break this large task down into a few smaller tasks one aspect of the problem is to allow the user to select a group of files and to do something to each of the selected files while you ultimately want to convert image files to a grayscale version we'll start out by simply printing the selected file name a small step toward the solution to the larger problem next you'll work through how to convert one image to grayscale using our seven step process after that you'll combine these first two ideas and programs into a single program that allows the user to select many files and convert each of these files to grayscale finally you'll make your program save the results to new files with the appropriately named file names welcome back you're currently learning how to convert a bunch of images to grayscale an important component of that process is the ability to convert one image to grayscale in particular you want to take a color image and produce an image that looks like it but only uses shades of gray not any colors as with all programming problems the way you want to approach this problem is with the seven step approach you've learned previously the first step is to work an instance by hand always we must work with a small manageable problem size here we picked a two by two image to work with we're going to want to make a two by two image for the output but how do you figure out what shade of gray to use for this pixel you need domain knowledge before you can proceed in this particular problem the knowledge you need is about colors or graphics the first thing we need to know is what precisely is a shade of gray a color is a shade of gray its red blue and green components are all the same however this knowledge by itself is not sufficient to tell you how to come up with the shade of gray for a particular color just that the result needs to have the red blue and green all the same one way you could do this is to average the red green and blue components or you might decide you want a weighted average because the human eye does not perceive all colors in the same way there could be more complex alternatives however this taking the average works pretty well and is simple now you have the domain knowledge required to do this problem yourself you can look at the RGB for a pixel compute the average and color in the output pixel appropriately then you would go through looking at the RGB values for each input pixel computing their average and coloring in the output pixel accordingly once you have colored in all of the pixels you've worked an instance of the problem by yourself and are done with step 1 the next thing you need to do is write down exactly what you just did I started with the image I wanted which we called in image then I made another image of the same size which we called out image I computed 255 plus 0 plus 0 divided by 3 which was 83 and I made the first pixel of out image have red green and blue values of 83 and then we went through each other pixel computing the average of the RGB and coloring the output pixel accordingly once we finish this we had these 10 steps that we used to solve this particular instance of the problem now you are ready to move to step 3 and look for patterns and repetition you can see that we are doing very similar things to each pixel but they are not quite the same we need to find the patterns in the number to generalize these steps to any image let us look at the particular numbers we need to generalize why did we use 255 here and 0 here these numbers were all the corresponding pixel in in images red component what about these numbers similarly these were the green component of the corresponding pixel in an image lastly these numbers were the blue component of the corresponding pixel next you should give a name to the result of this math it won't always be these particular numbers and you will want to be able to refer to it precisely we will call it average ok now that we have thought that through you can write the general algorithm notice how we thought about what we do for each pixel and wrote down general steps now we can write this in terms of steps to do to each pixel in the output image and we can work for any size image with any colors the last thing you should do before you write code is to test your generalized algorithm out on another small input here is a small image and the RGB values for each pixel take a moment to execute the algorithm and see if you get the correct answer yes the answer is right so you are ready to implement it in code now that we have developed our algorithm to convert to grayscale we have started here with a class for the grayscale converter and we have already imported edu.duke.star and before this video started I went ahead and wrote in the steps that we came up with as comments in our code to guide us as we write the first thing we did in our algorithm was we started with the image we wanted in image in this case that is going to be a parameter that we passed to our function so that this function can operate on any image we want the next thing we did was we made a blank image of the same size as an image so if I want to make a new image I am going to declare an image resource we called it out image in our algorithm and since I want to make a new thing I am going to say new the type of thing I want to make is an image resource and when I create this image resource I need to pass in information in this case telling it how big I want my image to be this is going to be the width of in image and the height of in image the next thing we did in our algorithm was we said we wanted to do something for each pixel in out image that is going to be a for loop which you have seen before pixel in out image dot pixels and we are going to put curly braces around all of the steps of our algorithm that we want to do for each pixel and you can see that blue j put this all in paint so we can easily see that these all are going to happen for each pixel in our algorithm the next thing we did was we looked at the corresponding pixel in in image so what we are going to want to do is in image dot get pixel at the same location as pixel dot get x and pixel dot get y now when we do this we need to give it a name so we can use it again that is going to mean we are declaring another variable so it is going to be in pixel equals in image dot get pixel the next thing that we did was we said we wanted to compute in pixels red plus in pixels blue plus in pixels green divide that by three and call the result of the average so I want in pixel red which is going to be in pixel dot get red plus in pixels blue in pixel dot get blue plus in pixels green and then I want to divide by three now if I write divide by three here I have made a small mistake because order of operations is going to make in pixel dot get green divided by three I want to divide the whole thing by three so I need parentheses and we want to call this average we are declaring a new variable what type is this variable well it is going to be just a plain number so in average equals all of that math now I want to set pixels red so pixel dot set red to average and similarly pixel dot set average and pixel dot set blue to average those are all of the steps I wanted to do for each pixel and then our last step at the bottom here says out image is our answer whenever we know the answer to a function we return that answer so in this case out image is our answer we just return out image so now I am going to come up here and click compile class compiled no syntax errors you frequently been testing your code by just making an object in the blue j main window and calling it methods but making an image resource there is a little bit tricky so we are going to write another method to help us test this out public void let us call it test gray which is going to take no parameters it is going to make an image resource we will call it IR with new image resource and that is going to pop up a dialog and ask us what image we want and then we are going to make gray on this image resource and then we are going to draw that now if we had forgotten the name of the method we just wrote we could scroll up and look but I just remember it is called make gray I am going to compile again no syntax errors obey the rules of the language we do not yet know if our code works now I am going to come over here to the main window I am going to hit new gray scale converter it is going to give me this object in my blue j now I am going to hit test gray it is going to pop up a dialog asking me what image I want I am just going to choose these nice colorful easter eggs but when I run it through my gray scale converter I am going to make sure it is in gray since my code passed this test case I am more confident that it works and in general the more test cases we run the more confident we will be in our code's correctness having written the code to convert one image to gray scale we would like to go one step further and convert many images to gray scale we have started here with the class we wrote in the previous video in which we implemented our algorithm I have added a method at the bottom select and convert which is going to let us select a bunch of files and convert them all you have already seen this code before which iterates over a selection of files and just prints out their name now we are going to combine these two ideas to give us code which is going to iterate over a bunch of files and convert them all to gray scale so we are going to start with a very similar structure here we are going to create a directory resource which is going to be a new directory resource and we are going to say for file f in directoryresource. and if you don't remember this name you can look at the API or look at the code we did before it is going to be selected files and what we did last time was we just printed out f what we are going to do this time is we are going to convert the image corresponding to that file to gray scale so I want an image resource in file which is going to be a new image resource and I would like to create this from f that is make an image by reading in this file then I would like another image resource which we are going to get by calling make gray the method we wrote before passing in in I called it in file but I would really prefer to call it in image since it is actually an image and then the last thing we are going to do is just draw this file so we can see something happening so I am going to compile this and it told me it cannot find the symbol class file what we need to do to use file is import star because that is the package in which files are now it compiles fine and now I am going to go back over here to my main blue j window I am going to make a new gray scale converter and now I am going to click on this and I am going to choose select and convert and it pops up this dialogue and if we navigate through here back to somewhere where there are images we can now pick some selection of pictures that we would like to convert each of them to and you can see that it went through and converted each of these pictures into gray scale and drew it drew the resulting image I think it is still converting one of them the last one was really big alright so now we have put those two ideas together we have converted any set of images you select the next step is going to be to save these images to a file welcome today what we are going to do is I am going to show you how to make copies of images by writing a program to do so I have got a file of images on my computer and I would like to actually read some of those images in and make copies of those files so we are going to do that with a program so I am going to create a new class and we will call it image saver and we will go ahead and open the editor and let us make it bigger and what we will do is we will create a method in here called do save so and what we are going to do is we are going to use our directory resource that we learned about so I am going to create if I can spell it right I will create a variable called dr of type directory resource and I will have to create a new directory resource and then we will like to loop over and select files from there so we will create a for loop variable f which is of type file and we will use the selected files method from the directory resource let us see what we are going to do with our files so now what we are going to do is create an image resource I am going to create a variable type image resource called image and that we will have to do create a new image resource and we will use the variable f that we read in in our for loop and I want to just make sure this works so what I am going to do now is just go ahead and have the image just draw and so we will just draw it on the screen and see if this works so let us see if this works I am going to try and compile it and nope I have got an error so let us see what it is so I forgot to put another curly brace file while parsing and it has got this highlighted so I am going to try and add another curly brace I need one for my method do save so let us put that in there that looks better and you can also tell the way the colors are lined up that I was missing that so we will compile it so it is still not compiling because it does not know what directory resource is so I have to import the edu.duke library so I will add that right here import edu.duke.star and that will should fix that error so let us compile again now it cannot find file so I am also going to have to import from the java.io library and I will just go ahead and specify that I want to import the file class and let us see if that compiles yay no syntax errors ok so now we can come over here and we can actually run this and just see all this is going to do hopefully if it works is it will just display some images that I am going to select so we will go ahead and create a new image saver and then we will go ahead and run our do save method now I have to find on my computer where I have got some images so I am going to navigate to here and I have got it on my desktop I have got a folder of images there it is and now I can select several images I will just go ahead and select the dinos and I will select another one Easter eggs I can select as many as I wanted to and I will go ahead and open and our program should just draw these images and it did we saw the two images so that is great that seemed to work I will just get rid of those and let us go back to our program so what we are going to do we know that we can get the images and we can display them and so now what we would like to do is we would like to make a copy of them so we are going to add some more code in here so now what we are going to do is first what we have to do is we are going to get a file name from the image file that we have just grabbed so I am going to create a string for that file name and we will just call that fname the image resource method that is called get file name and if you did not know about that you can look in the documentation and you will see all kinds of methods that you can use with the image class image resource class so just get file name should get me a string and then what we want to do is we want to create a new name which is going to be like copy of the old name so let us create another string variable this will be called the new name and it is going to be let us make the string copy dash and then we will add that to the front of the file name and the rest of the file name will be the old file name so let us see we will call this fname is the old name so we have created all we have done so far well let us try and run this let us see if this works if I will just compile this we compile with no errors so let us just run it we will come back over here we will create a new image saver and we will come over here and run our do save method we can now go pick some files we have to navigate over to the desktop where my images are and the images folder and I can grab let us see dinos and easter eggs again there we go and we open them and we got them and now I also have the images folder right here so we can just look and see we created a new name that was copy of and if we look in the folder I do not see any image that is called copy dinos so we created a string but we did not actually use the string to create a new file yet so let us do that so we have the name but now we need to use it so we are going to actually there is another method called set file name so let us use that so we will say image dot set file name and what we want to set it to is this new string that we created so let us do that it is called new name so that should do it and now we also what we did not do is we need to save the file we have drawn the file but there is also a method in the image resource class called save so let us do that too we will draw it and we will save it and that should save the new file so let us run this and see if this works we will compile it and then we will come over here and run the program so we will create a new image saver and then we will run our do save method we have to copy some images and so again I will just pick dinos and easter eggs and let us open them so they are displaying again but how do we know if it worked let us go look in the images folder which is right here and if you look in there you will see there are two new files in there one is called copy dinos and the other is called copy easter eggs so it looks like they copied them now let us see what happens if we run our program again and we select one of those or both of those even so let me get rid of these old images and let us just run our program again so I am running the do save method I am going to the desktop picking some images to make copies of and I am just going to make a copy of copy dinos and let us make a copy of copy easter eggs and we will also make a copy of let us see here so we are going to make a copy of those three files and let us see what happens so we had all those pictures pop up and let us get rid of them and if we look over here in the images folder let us see what we have we have copy copy dinos and we have copy copy easter eggs because we made copies of copies is what we did and then we also made a copy of Roger that is how you make a copy of an image where we grab the file we have to create a new file name and then we save a copy of it with the new file name but we have to give it a new name because there is already a file with that name so we added copy dash onto the front of it hope you enjoyed that, thanks Hi, we are now done with this lesson in which you have written a Java program to convert many image files in this lesson you learned how it can be really useful to break a large problem down into smaller pieces into smaller programs that you can then develop incrementally making each program work and then putting the correct working programs together you got more practice writing code which iterates over data first iterating over files that the user selected using the directory resource class and then iterating over the pixels in an image and put code to actually convert an image to grayscale converting the image to grayscale gave you more practice in solving a programming problem using our seven step method finally you put the finishing touches on your batch grayscale conversion program by making it save the new images with file names that were similar to the old file names this is a reasonably complicated program and you should be proud of having made your way through this lesson this is Duke University hi I'm Susan and I'm really excited to be working with our team at Duke to introduce this course Java programming, arrays, lists and structured data we've designed a great set of problems and programs that combine real world data analytics with lessons that introduce security and cryptography whether this is your first course with our great team or you're coming from our previous courses as part of Java programming an introduction to software specialization you'll use a seven step process to help you design and implement Java programs to solve problems and hone the skills that will help you become an effective software engineer in working on the problems we've designed you'll learn about arrays and maps two standard data structures that are used to create efficient and robust programs to solve problems as part of our cryptography lesson you'll learn how the words melon and cubed are related by the number 16 hello I'm Drew, in this course you'll be using the edu.duke library of classes we've designed to write programs that solve interesting problems like analyzing web logs and generating random stories from templates but you'll also use the standard java.util library of classes that will help you grow your knowledge and skill to create solutions to these problems understanding APIs so you can use code from libraries is an important part of this course as is beginning to develop an understanding of object oriented programming in this course you'll learn how classes are structured and how programs are created by strategically combining classes together you'll also learn about how the words fusion and layout are related by the number 20 hi I'm Robert for this course we've designed an exciting mini project as part of this course to help you learn more about classes, object orientation and data structures you will use standard techniques and libraries that are part of nearly all java programs designed to solve problems at scale we've structured the module in this course to introduce topics and then explore them in more detail and with more robust scalable solutions that improve upon an initial program this allows you to learn the new techniques while solving a familiar problem we hope that this learning approach will facilitate success for all learners also in this course you'll learn how the number 19 connects the words jolly and cheer hi I'm Owen I'm excited about the approach we've taken in this course to introduce two important programming structures arrays and maps these are not simply java structures they're used in every programming language to create efficient solutions to programming problems by exploring how these structures are related and encountering them in familiar contexts you'll be able to practice using them as you develop mastery with both concepts and the java libraries that anchor these concepts in code you'll also learn why having 14 fake toys would be an enormous cryptographic coincidence welcome to arrays, lists and structured data welcome back today you're going to learn a little bit about security which will show the importance of the problems you're going to solve in this module suppose you want to buy something online you use your computer which is connected to the internet so it can communicate with the servers at the online store from which you're making your purchases to complete your purchase you put your credit card or other payment information into your computer when you purchase the items in your shopping cart for example, your computer sends that information with your credit card information across the internet to the online store but what if a thief is looking at the data going across the internet this thief might be able to intercept your credit card information and use it to make fraudulent purchases you certainly don't want that and neither does the online store it wants your business so what makes online shopping safe what actually happens is that your computer encrypts the information before it sends it to the web servers for the online store the two computers agree on a special piece of data called the key and then use that with an encryption algorithm to transform the data so that only another computer with the key can decrypt the data now your computer sends the encrypted data across the internet and any potential thief is thwarted the thief can only see the encrypted data and cannot understand what the message says the receiving computer which has the key can then decrypt the data and send the original message when you are doing anything online your web browser will tell you if you have a secure connection for example chrome displays the HTTPS in green and shows a green lock icon next to it the S in HTTPS is for secure it's a different kind of connection to a web server than the standard HTTP if you were to click on the lock icon it would tell you the technical details of the encryption used to secure the connection there are many algorithms involved in securing your internet connection an algorithm called AES is typically used to encrypt the data that is sent to the server once the connection is set up however your computer and the server must agree on the key in a secure fashion before any information can be sent while this may sound quite difficult there are algorithms that can make this happen in this example my computer used two algorithms to securely set up the connection with the server one called Elliptic Curved Diffie Hellman and one called RSA these algorithms are very important to the secure operation of the internet but the math involved is a bit advanced to go into here to implement these encryption algorithms you would need to spend several days or months just learning that math which is not the focus of this course if you are interested in advanced cryptography Coursera has some excellent courses on the subject which you could take after you've mastered basic programming even though modern cryptography requires some advanced math you can however learn a lot about cryptography by looking to the past classical cryptography, the cryptographic algorithms used in past centuries involve simple mathematics and even predate computers these algorithms are not secure today they can be easily cracked by computers but learning how they work and implementing them will teach you some important lessons furthermore, learning how to break them will teach you a critical lesson do not try to make up your own cryptographic scheme if you need security use a well tested implementation of a modern cryptographic library so how far into history will we need to look to find the first use of cryptography the first known use of something resembling cryptography comes from ancient Egypt 4000 years ago however historians believe that the hiding of messages was not a serious attempt to guard secrets if you look forward a few hundred years to Mesopotamia around 1500 BCE you will find records of craftsmen using simple encryption schemes to guard their secrets when they recorded them on stone tablets going further forward to the Roman Empire the Caesar cipher which you will learn about in the rest of this module is named after Julius Caesar who used it extensively looking forward another 1500 years you will find the visionary cipher Giovanni Battista Bellosa actually described this algorithm in 1553 but it is named after Blaise visionary in the 19th century this algorithm is historically very important as it was long regarded as unbreakable however in the mini project you are going to write a program to break it continuing forward to the 1940s cryptography was a critical part of World War II and Blaise devoted significant resources to breaking the German codes with the core of that effort taking place at Bletchley Park in England Alan Turing was a leader in this code breaking effort and made many important contributions to computer science in fact he is so important that the highest honor in computer science is called the Turing Award so now that you know a bit about cryptographic history what will you be doing in this module you are going to learn about the Caesar cipher which you will implement it and then break it in the mini project at the end of this course you are going to learn about the visionary cipher and will also implement it and break it of course all of these problems are going to teach you several important skills that can help you solve a wide variety of other problems welcome back now that you know a bit about the importance of cryptography it is time for you to learn a bit more about the concepts of the Caesar cipher which you will implement in this lesson you were in a battle and you wanted to send a message to your sub commanders to tell the first legion to attack the east flank you don't want your enemies to know your plan even if they intercept this message so you encrypt it with your cipher as shown on the second line the Caesar cipher algorithm is named for joyous Caesar the famous roman emperor who used it the basic idea of this algorithm is to substitute each letter with the letter obtained by shifting the alphabet by a fixed amount that is a specific number of letters later in the alphabet the amount you shift the alphabet by is the key for this cipher joyous Caesar used a shift of three letters prior which if you think of the algorithm in terms of shifting to a later letter will be the same as 23 letters later to see how this algorithm works we will walk through an example of encrypting this message we will use the alphabet to show how letters are encrypted the first letter of the message is f we find the letter f in the alphabet here and then we go backwards three letters, E, D, C so you would write down a C as the first letter of your message the next letter is I we find the letter I in the alphabet here and then we go backwards the next letter of the message the next letter is R we find the letter R in the alphabet here again, go backwards three letters Q, P, O writing down O as the next letter of the message you would continue the same way through the first word and then you would get to a space doing this by hand as Caesar would have done the easiest thing to do is leave the space unchanged and write down a space in your message the next word in space however, what happens when you get to A we find A here it is the first letter in the alphabet but how do you go three letters backwards you have to wrap around to the end of the alphabet from there you go three letters backwards to Z, Y, X writing down X as the next letter you continue through the rest of the message in the same way and end up with something that is unintelligible under casual scrutiny however, if you know or can figure out the key you can decrypt the message the process is the same as encrypting with a key of 26 minus N so how do you actually do this one way is to do math on the numbers if you took our Coursera course programming and the web for beginners you should remember that everything is a number if you're not familiar with this concept it is very important in computer science computers can only work with numbers in this case that principle says that these letters are actually represented as numbers so you can do math on them in particular you could tell Java to subtract three from the letter F and it would compute the letter C however, what if you subtracted three from the letter A Java would not know that you want to wrap around and stay only within the alphabet so you would have to include some more mathematical operations or a conditional statement to wrap around and get X another way you could do this which makes the wrap around case a bit cleaner is to pre-shift the entire alphabet that is compute the shifts of each letter at the start before you try to encrypt anything in the message for example, you could take the alphabet and for a shift of three to the left compute a string like this one we will see the details on how to do this in a future video however, once you have computed these strings you can use them to look up the encryption of each letter for the F at the start of the message you want to find F in the original alphabet think for a moment about what you have learned about strings in the past what method might you use to find F once you have found F you look at the letter in the same position in the shifted alphabet which is C then you write down that letter in your encrypted message for A which wraps around the X you do not have any special case again, you just find A in the original alphabet look at the letter in the same position in the shifted alphabet in this case that letter X so you write down X in your encrypted message great now you know the basic ideas behind a Caesar cipher however, before you implement this algorithm you will need to learn a few new Java concepts you're going to learn some new ways to manipulate strings as well as for loops which count over a range of numbers for loops which count over ranges of numbers are particularly important as you will use the numbers you count to to index into data manipulating particular locations in a sequence you are familiar with strings which are sequences of characters and other types of sequences in the rest of this course so you will use for loops a lot thank you welcome back now that you know a little bit about how a Caesar cipher works it is time to expand your Java knowledge so that you can implement one you should know something about strings already if you took our previous course Java programming solving problems with software you learned how to manipulate strings if you are unfamiliar with strings however to perform encryption you need to do something with strings that you did not do in the previous course in that course you analyze the contents of strings and operate it on pieces of existing strings however you did not build up new strings if you carefully reexamine what you saw about how a Caesar cipher works you will see that you are making a new string by adding one character at a time one way that you can build up a string is with concatenation it is a very easy word for sticking things together in Java you can perform string concatenation with the plus operator whenever at least one operand is a string if one operand is a string but the other is not then the concatenation operator figures out the string representation of the non-string operand and concatenates that for example if you were to write these two strings with the plus operator between them you would be telling Java to perform concatenation the result would be the string you see here which was formed by sticking these strings together exactly what concatenation means to illustrate the usefulness of this operation think about the Caesar cipher you are working on you might want to take the original alphabet as a string and make a rearranged alphabet based on the key you could do that by taking two pieces with substring concatenating them together first you would take the substring starting at 23 remember that substring with only one argument returns the substring starting at the specified position and going all the way to the end of the string then you would concatenate the substring from 0 to 23 onto the end of that string now you have two strings which describe the mapping from each plain text letter to the appropriate cipher text letter we did this for the key of 23 but you should think for a moment about how you would generalize this for other keys as you learn more about manipulating strings it is important to know that they are immutable you cannot change an existing string once it has been created instead if you want a different string you must make a new one with that change this concept may seem a bit confusing and subtle so it helps to see it with a picture here we have declared a string and initialized it to hello nothing new or surprising here a common practice is to draw a picture of the effect of this statement by drawing a box for the variable s and making an arrow pointing to the letters that make up the string hello if you declare another string x and initialize it with s you would have a picture that looks like this both x and s refer to the same string now suppose you did s equals s plus space world that is you compute s concatenated with the string space world you are not changing the existing string but rather making another string which involves copying the existing existing strings like this notice that x is still hello because you did not change the existing string you made a different string and assigned it to s if you do a lot of string concatenation operations on large strings especially the required copying can be quite slow and inefficient even though we are not terribly focused on the most efficient way to do things yet it is still a good practice to develop good habits if you are building large strings by adding many small pieces you want to use a different approach in fact java has a string builder class specifically for this purpose it provides a mutable sequence of characters meaning it is like a string but you can modify it changing and inserting characters in an efficient way when you create a new string builder you can pass a string in to specify its initial contents there are also many useful methods we will name just a few of the most important ones and then you consult the API documentation for a full list and more details one useful method is append which lets you put a string on the end you could also pass in other types of data which will be converted to a string before being added to the end you can insert a string or the string representation of other types of data at any location you want you can get or set individual characters by their index the numerical location where they are when you are done manipulating the string buffer you will often want to use two string to get the string you have made as before it helps to see a picture of how these methods operate here we have started by creating a string builder and passing in the string hello we have drawn this picture with SB having an arrow pointing to the sequence of characters in the string builder now if you call sb.append and pass in world you will modify the existing sequence of characters notice how we change the existing sequence rather than copying them into a new sequence you could also insert or put the characters into the middle which would still modify the same sequence of characters like this great now you know how to build up strings from smaller pieces as you get ready to solve problems that are slightly more complex you will learn some new programming constructs to help you will learn about counting loops you have used different kinds of loops in solving problems before you used a while true loop in finding codons in dna or tags in a web page and you have used a for each loop with an iterable many times as in reading lines in a file or processing pixels in an image you have also used indexes to access and reference parts of a string you know that indexes start with zero in java languages so when you use dot index of you know it returns zero when a match is found at the first character in a string both dot index of and dot substring use indexes to access the elements or characters of a string we will look at code to access individual characters in a string we will do this in the context of looking at the reverse of a string where the reverse of cgatta is cgtagc and the reverse of tip is pit scientists studying genomes often have to look at a string backwards to analyze the dna it can also be a source of fun or wordplay to look at phrases that are palindromes the same read backwards as forward here's a sentence in russian that you can see reads backwards and forwards if you understand the Cyrillic alphabet it means the boar pressed the eggplant in spanish we have a palindromic sentence that means their jogs the tortilla while in french the sentence means and how is the cow finally we see an english sentence draw a caesar erase a coward which is particularly appropriate as we implement a caesar cipher module we'll use a new loop, the counting loop to reverse a string indexing a string in a loop can be done in several ways but we'll look at a very common approach we must understand a loop with three parts each part is separated from the other parts by a semicolon as you can see in the code on the slide the first part of the loop is called the initialization here the variable k is assigned the value zero which happens once before the loop guard and the loop body are executed the loop guard is evaluated each time before the loop block or body may be executed when the loop guard is true the body is executed and when it's false the loop executes sometimes the loop guard is called the loop test the increment here happens after all the statements in the loop body are executed after the increment the loop guard is evaluated again to see if the loop continues or exits we'll look at this more closely to understand the for loop we'll compare it to the while loop that you've seen before though not with this precise style of counting in a loop the for loop does not provide more power than a while loop or allow you to solve different problems the for loop is simply syntactic sugar or addressing up of a while loop all the parts in one place and many programmers think this makes the loop easier to write and to read as we've discussed the initialization happens once before the loop guard is tested you can see in the comparison here how initialization happens before the while loop too the loop guard is evaluated to see if the loop body will execute when the loop guard is false the loop is over both the while and the for loop will exit when this loop guard or test is false when the guard is true the loop body executes and as the last statement in the body we see the increment statement which will execute we'll trace through the execution of a for loop in a particular example of a reverse function to better understand the loop and to trace this we'll look at the call reverse pit this means the value of parameter s p i t the local variable r or ret will accumulate the reversed string it's initialized to the empty string before the loop as we trace through the code the green arrow indicates the statement that will execute next the loop index or control variable k is initialized to zero remember that the loop initialization only happens once in a for loop and the variable k is only accessible within the loop but not after the loop its scope is limited when the loop guard is checked the value of k which is zero is less than s dot length which is three since pit has three characters the loop guard evaluates to true loop guards are always boolean expressions the loop body now executes and the string method care at acts as a character at a specific index we should point out here some people say care at the way i do and some people say char at either one is fine since k is zero the expression s dot care at zero evaluates to p we've shown the character p with single quotes which is used in java to indicate the primitive type care the value of ret is shown as the empty string in double quotes because these double quotes in java indicate a string literal concatenating the character p to the empty string yields a new string p the variable ret will be assigned the value p the string variable changes and it's no longer pointing to the empty string as it used to remember that strings in java are immutable we can create new ones but we can't change a string the increment executes after the loop body this changes k so that it has the value one after the increment statement we're ready to trace the next iteration of the loop the local variable ret has the value p the loop control variable k has the value one and we're ready to continue the trace k has the value one the length of s the string pit is three and so the loop body executes since the guard evaluates to true remember that the dot care at method accesses the kth character here that's the character i whose index is one the character i is prepended via string concatenation to ret which is p and this creates a new string ip the assignment statement changes ret so that it references this new string and now the loop increment will execute the value of k is two and the loop will continue to execute we'll now trace through the last iteration of the loop the local variable ret has the value ip the loop variable k has the value two and as we can see here the loop guard will be evaluated since two is less than three the guard is true and the loop body executes s dot care at t evaluates to t as you can see here the character t is prepended to the string ip to form the string tip variable ret now references tip and the loop continues the increment statement executes which changes k to have the value three now the loop guard will be evaluated again as the loop guard is evaluated here you can see that the value of k is three and the length of s is three as well since three is not less than three the loop guard is false control in the program continues to the statement after the loop the value of ret is tip the reverse of the string parameter s so tip will be returned it's good to know that you'll see others write code and you should understand that many programmers use i as a loop control or index variables some programmers think the letter i is hard to distinguish from the number one but i is more common than k in reading other code many programmers use the post increment operator i plus plus instead of i plus gets one we won't explain the nuances of i plus plus here but it's a very common idiom in using loops and it's fine to use i plus plus by itself in the loop increment sometimes it's useful to define the loop index variable before the loop rather than the parentheses of the loop this allows the value of i to be referenced or accessed after the loop is over when the variable is defined within the parentheses of the loop that loop control variable can only be referenced within the loop body but not after the loop have fun programming and programming and programming as you loop and loop and loop hello we'll introduce the class character which you can use to determine properties of character values in java the type care is a primitive type like int, boolean and double some people pronounce it char some people pronounce it car and some people say care but everyone says the word character the same way so i typically use the careful pronunciation care values are specified with single quotes for example you can see quote a quote and quote one quote and quote space quote here these are character values the value double quote a double quote is a string value it's usually much easier to write the code than to say all these quote values the character class has several methods you can use in writing code you may remember the methods integer dot parsant and double dot parse double these were methods of the classes capital I integer and capital D respectively the method capital C character dot to lowercase returns a lowercase equivalent of its argument for example this call will return a lowercase g since an uppercase g is the argument to character dot to lowercase if you pass a character that's already in lowercase the same value will be returned the table shows boolean valued functions like is lowercase and is digit and conversion functions like to lowercase and to uppercase using the java documentation will show you more boolean and conversion methods have fun building character and writing code we have the character demo class here in blue j and we've got two methods that I'm going to run through and illustrate now I'll add one very quickly this first method digit test creates a test string that has uppercase characters digits and punctuation moves through every character of the string and calls the character dot is digit method a boolean method and the character dot is alphabetic method another boolean method so let's run through digit test and see what it does I'm going to create a new object on my workbench by right clicking and then I'm going to run the digit test method by right clicking on that and we can see here pretty clearly that a, b, c uppercase characters little a little b little c those are all alphabetic characters and then the digits are labeled as digit characters notice that no punctuation was printed so that when I go back to my editor we can see that the uppercase characters were all alphabetic the characters that look like digits were all labeled as digits and the punctuation wasn't any character in that it didn't have the label alphabetic and it didn't have the label digit I just want to illustrate one quick thing here I can also say if ch is equal to the character hashtag then I can print a message that it's a hashtag highly enlightening and now if I compile this program it compiled without any errors and I'll make another object I'll invoke the digit test method and we can see that low and behold hashtag is a hashtag that's just a reminder that for characters we use single quotes to differentiate the values whereas strings use double quotes we can see that here where I've created another string test in the method conversion test I've created a similar string with uppercase characters lowercase characters digits and some punctuation I'm going to loop through by using the string charat method to store a character variable ch I'm creating a uch variable and an lch variable both of type care I'm creating them calling character.to uppercase which will return an uppercase character and character.to lowercase that will return a lowercase equivalent remember that converting a digit to uppercase doesn't change the digit at all and if a character is already lowercase converting it to lowercase leaves it alone so running that method I will right click on my class and call conversion test we can see that I get the characters in my string on the left column the function that you get by calling to uppercase and the results that you get by calling to uppercase so I get character uppercase lowercase you can see that in each column I have all uppercase characters or digits and punctuation all lowercase characters or digit or punctuation and as one quick review of that code to remind you of where that came from you can see that I called to uppercase to lowercase and then printed them as the character using the Java documentation for characters will help a lot in making your program run smoothly when you're using character values have fun building more character than you did last time welcome back now you have all the concepts required to implement a Caesar cipher so let's get started developing the algorithm as always you should start with step one working an instance of the problem yourself even though we have seen some instances worked it is good to work a small instance so you can write everything down and think it through let's encrypt the message I am which is really I space A M with a key of 17 meaning you will shift each letter 17 positions through the alphabet you might want to start by writing down the alphabet and then running down the alphabet shifted by 17 characters underneath it for example A has R beneath it where R is 17 characters to the right of A in the original alphabet next you would go through and replace each character in the message with the appropriate letter from the shifted alphabet when you are done you have the encrypted message Z space R D great you have finished step one now it is time to do step two and write down exactly what you just did the first thing you did was write down the alphabet then you computed the shifted alphabet the third thing you did was to look at the zeroth letter of the message don't forget when you index into sequences such as strings and string builders the first element is at index 0 that letter was I so you looked in the alphabet to find I then you found the letter in the shifted alphabet at the same position which was Z so you replaced the zeroth character of the message with Z next you looked at the first letter of the message which is a space if you looked for space in the alphabet you would not find it so you would not change the character at index one of the message next the second character is an A for which you perform a very similar process as you did for the zeroth character and end up changing it to R finally you do the same thing for the third character which is an M that you turned into a D now that you have thought through all of that you have a list of the 17 things you did for this particular message and this particular key however there is one more thing that is good to note here before you proceed notice that your algorithm calls for replacing characters in the message if your message is a string you cannot do that as you recently learned strings are immutable meaning you cannot change them if you recognize this issue now you can adjust your algorithm to reflect the fact that you want to work with a string builder here we have added a step at the start to create a string builder from the string message and then we updated the algorithm to work on the string builder if you do not realize you need to do this now you would figure it out at a later step but the earlier you can figure it out everything you need to do the better looking at this algorithm you can see that the first few steps are an initial setup before you begin performing repetitive steps for each letter in the message if you focus on the steps after the initial setup you can see that you are doing almost but not quite the same thing for each character in the message one significant difference is what you decide to do based on whether or not you find the letter in the alphabet replacing the current character if you find it or doing nothing if you do not if you look at the steps for one particular character where the letter is in the alphabet you will notice that the character you looked for in the alphabet is the current character in the string and that the letter you used to replace the current character is what you found in the same position in the shifted alphabet now that you have thought this all through you can write down a much more general algorithm notice that the step number two here requires a little thought and a couple of statements when you are looking for patterns you should examine any constants such as zero here and ask if you always use that constant or if you need to look for a more general pattern here you always want to start from zero what about three do you always want to count or do you always want to stop counting at three no how high you count depends on the length of the message here we have written that you want to count to the length of encrypted but noted that you want to count to less than it not less than or equal to it in our example encrypted was four and you only want to count to three now it is time to test out the steps pause the video now and try to encrypt the message a space bat with the key of nineteen did you catch the subtle problem with this algorithm even though it computed everything you wanted we never said what the final answer is you want to be sure to explicitly say this so that you know what to return from your method when you translate the code your answer is the string inside of the string builder you called encrypted now that you fix this detail of your algorithm you are ready to turn it into Java code thank you okay now that you've developed the algorithm for Caesar Cypher it's time to turn it into code we've started here with the Caesar Cypher class which has an encrypt method which takes an input to encrypt and an integer for the key to use to encrypt it I've written here the comments with the pseudo code that we just developed in our algorithm and I've gone ahead and written the first two lines of code the first step is to make a new string builder with the input and call it encrypted and the second is to write down the alphabet which is a string the next thing is to compute the shifted alphabet so we're going to make a string shifted alphabet and remember we saw in previous videos that we can use substring to do this alphabet.substring key concatenated with that's what plus means for strings alphabet dot substring zero comma key that will slice the alphabet up into two pieces and concatenate them back together to give us the shifted alphabet that we want now we want to count from zero to being less than the length of encrypted and we want to call each number that we count i so this is going to be accounting for those three files i equals zero, i is less than encrypted dot length the length of encrypted i plus plus and what do we want to do well we want to look at the the ith character and call it kerchar kerchar encrypted dot char at i then we want to find the index of kerchar in the alphabet int idx we said over here call it idx equals alphabet dot index of kerchar then we want to see if that's in the alphabet as you've seen before index of will return negative one is not there and some other number if it is so it would not be there if idx is index or is negative one we want not equal to negative one to mean it is there now we want the idx char of shifted alphabet and we want to call that new char equals shifted alphabet dot char at idx and now I want to replace the ith character of encrypted with new char so encrypted set char at and then I want the ith character to be new char that's everything I want to do inside of that if otherwise do nothing I don't need to write an else if I don't want to do anything otherwise this curly brace closes my counting for loop here I said I want my answer so I want to return something to be the string inside of encrypted so encrypted dot to string I'm going to compile that said no syntax errors and I've written here already this method test Caesar which is going to make a new file resource so it'll prompt us for a file read it all in as a string encrypt it it will print out the encrypted message because if we just saw a jumble of random characters we wouldn't really be able to tell if we did it right but being able to tell that we got the original back makes us more confident that what we did was correct I'm going to go over here to blue j I'm going to say new Caesar cipher I'm going to say test Caesar and it's going to prompt me for a file and I have here this message in a file called message one dot text that says free cake in the conference room maybe I want to send this to Robert without O and seeing it and being able to get all the cake and so I'm going to choose message one dot TXT and we can see the encrypted message here the view IVV it's been made more difficult to read and then we can see down here where it decrypted it correctly so I'd be a little more confident in this I'd want to go through and check by hand that this actually came out with the correct key before I'd be completely certain that this test case worked correctly now we've written and tested our Caesar cipher class we tested it out on one message and it looked like it worked does this mean we can be sure our code is correct of course not remember testing software each test case makes you more confident in your code but no number of tests can guarantee that your code is correct you want more and more tests to be more and more confident and one test case generally isn't enough so here I have a different message dear Owen no matter what you may have heard there's no cake in the conference room the cake is a lie please keep working on Coursera videos so I want to encrypt this and send an encrypted message to Owen to fend him off from the cake in case he intercepted my message to Robert I'm going to use my test Caesar and now I'm going to do message 2.txt and when I look at this I'm going to see that the D in Dear turned into a U that's good but then E, A and R were left the same then the O and Owen turned into an F but then W, E, N were all left the same in fact most of this is unchanged so my code isn't quite right if I look back at it I'm going to see that it's only going to work with capital letters and not with lower case letters this was the problem in our first test case it only had capital letters it didn't have lower case letters so we didn't really test all the cases enough we're going to leave this to you to fix there are a couple different ways you can do it and we hope you can figure one of them out so good luck and happy fixing of your code let us wrap up this lesson where you learned a lot first you learned a bit about the historical as well as modern importance of cartography then you learned some more concepts about strings as well as how to use a string builder to efficiently construct a string then you learned about counting for loops adding another important tool to your programming repertoire finally you implemented a Caesar cipher a classical cipher dating back thousands of years as you will learn in the next lesson this cipher is not secure by modern standards but it is a good starting point to understand the ideas of cryptography hi I'm Jeff Forbes a computer science professor at Duke University and a friend of Susan, Owen, Robert and Drew I do research in computer science education and learning analytics but I also teach the data structures using Java I'm really excited to be able to give a guest lecture about using arrays to break or crack a Caesar cipher a method of encryption I know you've been studying you or someone else has implemented a program to encrypt text using a Caesar cipher this is a very basic and historically interesting form of decryption though it's not secure given patience to a computer in your skills at programming the concepts in cracking the cipher are useful in solving other problems too a key is used to encrypt to shift all the letters in a message but how do we decrypt we know that decryption must be possible since the intended recipient must be able to decrypt and read the encrypted message being sent because a shift of 26 is the same as a shift of 0 encrypting with the shift of 7 followed by decrypting with the shift of 19 will result in the original message just like a shift of 26 how does knowing this help us crack the cipher a thief or hacker could find the key which is a number keys are often numbers both in the Caesar cipher and many other forms of encryption the hacker simply subtracts from 26 and will be able to decrypt the message if the hacker or thief doesn't have the key is it possible to use brute force or some other way to crack the cipher brute force means trying every possible key with a human helping using brute force with a Caesar cipher makes it relatively easy to decrypt the message suppose we intercept this message which is too difficult to pronounce can we tell what this message says simply by looking at it that seems unlikely if we know the key used to encrypt this message we could easily decrypt it but how many keys are there perhaps we can simply try them all that's what a brute force approach is the basic idea is to try every key we already have the code to encrypt the message we'll use every key from 1 to 26 or 25 to encrypt the message we're trying to decrypt since the decryption shift is just 26 minus the original encryption shift if we try all 26 shifts we'll find the original message we can try every key using this brute force approach because the number of keys is small and trying each key is fast the same approach won't work for other forms of encryption because there might be too many keys it's also possible that using each key to encrypt could take a long time before we talk about an approach that's more sophisticated than brute force let's work to understand brute force in what we call eyeball decryption our goal is to unlock or decrypt an encrypted message we don't have the key used to decrypt we're not that fortunate however we do have the key used to encrypt used from the class Caesar Cipher using that we can try all 26 keys to decrypt using a human or eyeball approach we'll create a Caesar Cipher object we'll try all 26 keys from 0 to 25 we'll use our Caesar Cipher object named Cypher to shift the message with each of the 26 keys then we'll print the result of the shift as we'll see we can decrypt the message if we recognize words how do we find the original message when we run the code we just discussed we'll be able to view or eyeball the result of encrypting 26 times we'll scan all 26 strings produced by 26 different keys and we'll do this methodically as we eyeball each string we look carefully to see if the string is recognizable as English since we're looking for an English language message this line isn't recognizable this line doesn't look like English but let's look closely no it's not English we'll look at the next line let's examine this line closely this line is easily recognized as English text and we see that encryption and security are fundamental parts of today's internet hello you're going to learn the basics of a powerful programming construct, the array nearly every programming language in widespread use uses a similar structure to be able to represent many items with one variable when you wrote code to help a genome you could count the number of occurrences of each C, G, T, and A in a strand of DNA this is a real problem that helps find protein coding regions rich in CG content as a programmer learning about encryption and decryption you'll need to count the number of occurrences of A, B, C and every letter of the alphabet through X, Y, and Z 26 counters to be able to break a Caesar cipher to decrypt a message encrypted with a Caesar cipher although you could do this with 26 variables you'll learn a new concept to help with the coding you'll learn about arrays which are homogeneous collections of values here you see post office mail boxes they each look the same but each is numbered differently you can access box number 344 or box 345 and you can either store letters and packages in a box or take them out of a box arrays are similar concepts in programming as you'll see one array could represent 26 or 1026 counters if your alphabet was huge or if you were solving a different problem let's look at code to help a genomic scientist we'll look at code that counts occurrences of C, G, T, and A to understand and motivate the problem of counting 26 letters as you can see we've created four counters initialized each one to zero and then incremented the appropriate counter as we process every nucleotide in a digital strand of DNA this solution works but it's very hard to scale to having 26 counters which we would need to count every letter A through Z as part of decrypting a message encrypted using a Caesar cipher this isn't conceptually hard we could use C A, C B, C C and so on up through variables C Y and C Z to have 26 variables and we could have 26 if statements to increment the appropriate variable but writing the code and changing what we do with 26 values is time consuming and difficult to change if we want to print something different for example when looking at the output we'll use an array an indexed collection to use one variable in place of 26 we'll break the Caesar cipher by counting occurrences of each character in an encrypted message a message in English would typically have the letter E as the most frequently occurring character so once we found how many times each character occurs in an encrypted message it's likely that this character is E and we can determine the shift used in encrypting the message making the decryption process easy in general counting and collecting values in an important tool in writing programs thinking about arrays in the context of breaking a Caesar cipher will help in solving many other problems you've seen the class storage resource which was helpful in storing string values that class was useful but its use was limited will expand on the idea of arrays and the storage resource class later for now we need an indexed collection like the collection of mailboxes we saw in which a number is used to access a specific location this is the same concept you've seen with strings in which an index is used to access a particular character in a string using the dot care at or dot substring methods arrays can store any type of value not just the type care that's used in strings we'll look at the concepts and code in using arrays and see how they're similar to what you've seen before with strings you define an array similarly to defining a string when defining a string variable as you see here you indicate the characters that are in the string and assign these characters to a variable with an array you must specify how many storage locations there are and use the square brackets with the variable name to indicate the variable is an array this code allocates 256 memory locations each one holding an int with the value 0 which is the default value for integers in an array the concept of indexing is used to access elements of a string and an array with a string the dot care at method is used with an index to access a specific character the first character is 0 because we use 0 based indexing with arrays the bracket operator accesses a specific element again with 0 based indexing when writing code you often need to know the number of characters in a string or the number of elements in an array with a string you use the dot length method to determine how many characters there are in the string with an array you use the value stored in dot length to access the number of storage locations allocated for the array note that dot length is not a method for arrays this is sometimes a source of confusion when writing code we'll look at the code to count the number of occurrences of every character A through Z you'll see this code and the concepts in it will help you solve many problems when programmed in this code you'll see a variable named counters that will represent 26 different counters the code will store the number of occurrences of A in counters sub 0 we use sub as a shorthand for subscript for mathematics we'll see that counters sub K is the number of occurrences of the Kth letter by this we mean that the number of Bs is at counters sub 1 and the number of Zs is at counters sub 25 as you look at the code you'll see there are three parts just as there were with the code that counted the number of occurrences of C, G, A and T in the string representing DNA in that code four counters were defined and initialized to 0 here there are 26 counters defined and initialized to 0 the array referenced by variable counters takes the place of 26 different variables in the DNA counting code we use the sequence of four if statements to determine which counter to increment here we use the location of a character in the string alpha we use the index of the appropriate counter to increment note that A has indexed 0 we even use both upper and lower case As using the character.toLowerCase method so that the index value returned by alpha.indexOf helps us increment the appropriate storage location in the counters array finally to print each result we use the loop index K to both access the Kth value stored in alpha and the Kth value stored in the counters array you'll gain experience in solving problems with array here's a quick summary of what we've just introduced arrays are indexed collections of values when defining an array you will typically provide an integer value indicating how many elements can be stored in the array it's possible to define a variable like X as you see here with no storage you can also use the array to define the type of the variable this could be useful for example as a parameter in a method if you define an array by calling new you must provide an integer value for the number of array elements in an int array all locations will be initialized to 0 for a string array all array locations are initialized to null that's the value we've seen before that indicates there is no object being referenced all locations are read and written using indexes you can store a value in an array as shown here with S sub 3 getting the string hello this is writing a value into an array location you can also access or read an array location as shown here where on the right hand side of the assignment statement we see X sub 3 is used to assign or write to a value on the left hand side of the assignment statement X sub 2 once the storage is allocated for an array the array size does not change this may be why .length is not a method but a value when an array is passed to a method the contents of the locations referenced by the array can change this is subtle and you'll see examples of it when we use arrays to solve problems have fun coding computers are very good at modeling a simulation in part because computers can perform billions of mathematical operations every second often simulation and modeling relies on generating random numbers in this coding example we'll see a simple use of arrays to help determine how really random the java.util.random class is computers don't use natural random phenomena but model randomness with what's called pseudo randomness using mathematics to model what might be random events in nature in this example we will simulate rolling a pair of dice many times by generating random numbers we would like to know how many 2s do we roll how many 7s do we roll we will count how many times each type of roll occurs so I've got this code right here called simple simulate and I can run it and say how many rolls I want to do and right now it counts how many times I get 2 and how many times I get 12 let's run it and see what it does so we'll come over here it's already compiled I'll create the object and we'll run simple simulate and here we actually get to choose how many times so I'll say 10,000 times and we'll just run it and you can see I got 2 298 times and I got 12 271 times it'd be nice to run it all the possible rolls you could get from 2 to 12 so let's look at doing that so what I'm going to do is I'm going to cut and paste this and we'll write a different method so we'll just copy this whole thing and then we'll put it up here and we'll just call this one simulate there we go and so now we're going to have to modify it so the first thing is we have generate a rand and then we need counters for 8, 9, 10, 11, 12 that's a lot of counters so what I would like to do instead is to use an array of counters so that's what we'll do so we'll replace this here with an array so we want an array of counters so it'll be int array instead we'll have to give it a name we'll call it counts and then we'll have to create the array it's an integer array and then what size is it going to be so 2 through 12 we need at least 11 counters but it would be kind of nice if we set it up so that when we threw a 4 we would count 4 in the fourth slot of the array so what we'll do is we'll create the array with size 13 which will allow us to use from 0 to 12 and really we'll just use 2 through 12 the 0 and 1 counters so I'll make my array of size 13 I don't need this 12 counter anymore so we'll get rid of that so now we're going to have to adjust the code to handle using this array of counters so we're still rolling whatever number of times we type in for rolls so that's the same we're still throwing 2 dice, D1 and D2 and we are going to add them together but as you can see here we have this if statement where we had to check and see was it 2, was it 12 and we could add all these other if statements was it 3, 4, 5 and so on but instead we can just notice we just add the 2 together that's also the index position of the counter and that's all we have to do so we can get rid of this if statement and instead we're just going to add code so we're going to update our counter at that position by 1 so our array is called counts we're going to add in the D1 plus D2 slot we're just going to add 1 plus equals 1 that's it, that replaced a huge if statement let's see, so now we're going to output so you can see we have 2 lines of output for 2s and 12s what we're going to do instead is we would be nice if we just had one print statement and we used it over and over again for each position in our counter array so we're going to add a for loop here and remember our counter started at 2 so we'll have our k start at 2 because the lowest thing you can get is 2 ones from each dice and that's 2 so that'll be the lowest and then it'll be k less than or equal to 12 because we're going to go from 2 to 12 and then we'll update it the same way, k plus plus and then we just need one of these output statements to modify I'll just type a new one we'll just say here so system.out.printLine and what we want to do is we want to print k because that'll be the counter so k for 2 through 12 so we will print k and then we'll print we can still do the quotes equals and a tab slash t is the tab plus and then we want to know what was the how many like 2's were there or how many 10's were there etc and that's just count counts k that's how many of type k there were that we counted plus we can add another tab and then we can add the rest of this 100 times counts of k divided by roles and I'll just put that in parentheses like I said doesn't matter we do need to add a semicolon at the end and then I can get rid of these two print statements here there we go so I think we have everything now we now have our loop that prints for each one 2 through 12 what it does let's see if we have it compiling we're missing one right per end right here so we can add that oops we'll add it and then put our semicolon there we go let's try compiling it again and it says it compiles there's our code so now let's run it and see what happens it's already compiled we can create our object and then we can run it simple simulate this is called simulate and then we'll run it again like oh let's run it 100,000 times there it goes okay so now you can see it counts for 2s, 3s, 4s and so on how do you know it actually works we just got lots of numbers it looks pretty good in the sense that if you roll 7 you're going to get that more likely than any other thing because 7 is 2 and 5 1 and 6, 3 and 4 and it does have the highest number but to really be sure what we'll do is we'll run a simple example and we'll add print statement like let's run it 10 times the counters are really working correctly so let's do that real quick we'll go back to the code here so this is really adding just for testing it to make sure that the counters are really the counters we think they are so whenever we roll a dice which is right here I'm just going to add a print statement here so what I'll do is I'll print out roll is d1 plus which equals d1 plus d2 so that's all we're doing we're just printing out the roll then we're going to just run it a very small number of times so that we can see our counters are actually working correctly so we'll compile that and that worked so let's run it now so I'm just going to run it 10 times throw the dice 10 times and let's see what we get it looks like 1, 2, 3, 7s and if you look it counted 7 3 times we got 1, 2, 3, 10s and it counted 10 3 times we got a 4 once so it looks like it is counting them correctly so this is the way you can kind of test it on a small amount of data and that seems to work alright so anyway I hope you enjoyed dice rolling thank you what are the most common words in the English language? there are many ways to find common words using search tools to find shared data that has been curated from thousands and thousands of available text helps here others have already done such to figure out what the most common words are for example they have determined that the word the is used more than any other word did Shakespeare use common words? we can use our edu.duke classes public domain versions of Shakespeare's plays and some simple array code to help answer this question in this example we will examine several of Shakespeare's plays and count how many times he uses the most common words alright so I've started the code for counting the most common words in Shakespeare's plays so here's the code let's look at it I've got this method called count Shakespeare and you can see here actually this is a new thing this shows you how you can initialize an array right here you can we've got the array plays which is an array of strings and we can just put curly brackets and list the items in the array and so what I've done here is I've listed six data files that we've gotten that have text from six of Shakespeare's plays we've also got another string array called common and there's a method called get common let's go look at that here it says here's get common and here you can see there's another data file this one's called common.text and that's where I've already put 20 common words the most common words that somebody else has determined are the most common English words we're just going to read that in and since I happen to know there's 20 words in there that's important I can create the common string array of size 20 because arrays have to know the actual size so I'm reading that in just reading and this loop here all it does is it goes through and just reads them in one at a time and puts them in there so we read in the 20 most common words that's all get common does let's go back down here so we've got the plays we've got the common words and now we have a loop which is going to go over and it's going to read in each play I've got the plays stored in a data folder so you can see I'm adding data slash right before the name of the play and then I'm going to call count words which is going to count for each word in the play if it's going to check and see if it's one of the common words and if it is it's going to count how many times each common word appears and then we just print a message that says hey we're done with the plays once we've counted all the common words then we go through and we're going to print for each common words for example we're going to print the and then how many times it occurs in all six of Shakespeare's place so let's go ahead and run this and see what happens we'll compile it it's compiled we will create an object and we'll just go ahead and run it count Shakespeare and you can see it doesn't work quite right it's only counting of a lot so let's go back and look at the code it turns out I haven't given you all the code we still have to write one of the methods so let's go see we are going to have to write index of so what index of does is it takes a list of words and a word and we would like it to look for the word in the list of words or the array list of words and see if it's in there and count how many times it appears or actually we want it to give you the location the index of of where it appears in there so what we're going to have to do is we're going to have to loop over all the words in the array list and check to see if word is equal to one of them and if it is we'll return the location of where it is so this let's go ahead and start that alright so we will start with a for loop so in order to loop over the array we'll need a for loop so let's say we'll need a variable int k we'll start at zero and then as long as we don't go off the end of the array so we'll say k less than list dot length and then we need to update our counter k by one okay so we'll go through and then what we need to do is for each kth slot we need to check and see if the word matches the word that's in that slot so we'll ask if what is the word in list k and we need to compare it to word so we'll use dot equals and if it equals it we found it so we can return the location of it return k so this loop will go through and check to see does word equal this word this word this word and if we find it in slot k we return k we don't find it we need to indicate we don't find it so we need to fix this return that's after we've looked at everything we'll change this to minus one which is not a position in the array and that will indicate that we didn't find it now let's see how index of is being used we come over here index is being used in count words which is right here and you can see here we call index of for each word we're getting each word out of the file we pass the common words and we pass the word and then we check to see does it match if it matches then we're going to use the counts array to update that count so for the word the we'll keep track of how many times we find it every time we find it we will update the counter counts which is in the same slot as the word the which is in the common array alright let's try and run this now we'll compile it no syntax errors that's good we'll come over here and we'll run it so we're going to run count Shakespeare there we go so what this does is first of all you can see I've got six data files Caesar.text, errors.text, Hamlet, like it, McBeth and Romeo and we've read all of those in and for each word in that file we want to see if it matches one of these common words here you can see the common words are the of and and so on and you can see that the appears in one of those or in the combination of those six text 42, 4237 times and then you can see another one four appears 1071 so you can see how many times each of these common words appear so it looks like Shakespeare did use a lot of common words now how do we know this is actually correct they're just big numbers how do we know so what I'm going to do as I'm now going to modify my program so I can run it on a really small file just to make sure and convince myself that I'm counting these numbers correctly so I will come back over here and you can see here I have this file called small.text where I just put a few words in there and what I want to do now is modify my code so that I look at this file small.text so back here in count Shakespeare what I'm going to do is I'm going to comment out this line here these two lines and I'm going to create my string plays just to be that one file so I will say string plays is going to equal just the one file which is small.text it's all I have to do and now I'm going to run it and this time it will just look at that one file so let's compile it we got it compiled we need to run it and there we go so let's see so if you look in small.text you can see the word the appears one two three times if you look over here you can see we counted the three times that looks good of appears twice we got two and and appears twice we got a appears once so it looks like it worked now I'm really more convinced that I counted the Shakespeare words correctly alright that's it thanks we'd like to automate the cracking or breaking of the Caesar cipher to do this we'll rely on frequencies of letters in English text if you're encrypting a message in another language you'll need to use the frequencies from that language but the approach will be the same we'll write code to find the character that occurs most frequently in the message being decrypted we'll assume this is the letter E since E occurs more frequently than any other letter in English text in Russian for example the letter O occurs more frequently than E if our assumption about E is wrong we won't decrypt the original message it's possible to use more than just E but to rely on the frequencies of all the letters and use statistical approaches to break the Caesar cipher in some cases these approaches will break other encryption methods though not the methods used to encrypt data for online shopping and secure transactions let's look at the code for decryption in two steps we need to count the occurrences of each letter A through Z in the message we're encrypting we'll have code to scan each letter of the text and increment a counter for each of the 26 different letters initially all the counters are zero since we haven't started scanning the text letter by letter each counter is numbered from zero to 25 because the counters are array elements the index from a string of 26 letters will help us find the right index for the counter we'll increment as we scan the text as we scan the message looking at each character we'll increment the counter at index 7 for H then as we scan I will increment the counter at index 8 which is the index of I in our alphabet string we won't increment for the comma or for the space then we'll increment the counter at index 3 for the D the counter at index 14 for the O we won't increment for the space because space doesn't occur in the alphabet then we'll increment the counter at index 24 for Y and we'll set the counter at index 14 to 2 when we scan the second O in the message when we're done scanning every character we'll have these values for each counter if you look carefully at these values you see that our decryption method the value of the counter with index 4 is 0 there are no E's in this message but this is a very unusual message now we'll look at the code for this idea we scan the message character by character using a standard for loop we find where the character occurs in a string of each letter in the alphabet so that E will be found at index 4 notice that we converted the characters in the message being decrypted we use the index in the alphabet to increment the corresponding counter as part of decrypting the message if the character wasn't in the letters of the alphabet the dot index of method return negative 1 and we don't increment any counter the code that uses the idea of E occurring most frequently is straightforward developed from the ideas algorithm and code you just see as you can see the code isn't very long we've created two helper methods and relied on this Caesar cipher class to help we've called a method count letters which we just discussed this method will count the occurrences of every character in the string encrypted with A being stored in the first or index 0 location of the array returned by the function and referenced here by the variable freaks then we call a method match index to determine the index of their entry in freaks that is the largest the location we will assume is where the E was shifted we'll find the distance from this location to E E has index 4 since we start with 0 for A and then get B C D E for 1 2 3 4 respectively if the maximal index is less than 4 we'll need to wrap around 26 to find the shift used for E if the value D key was used to encrypt then 26 minus D key is used to decrypt and we return the decrypted string you'll be ready to use your programming knowledge to finish the task of decrypting and then apply this knowledge in the mini project with a different cipher but there's some details we want to highlight the array freaks the code we just saw has a relationship between the index and the value in the array for example freaks of 8 is how often I occurs since I is the 9th letter and has index 8 remember we start with index 0 when looking for a maximal value as with the method match index that we called and whose implementation you see here we return the index of the largest value not the largest value itself we use the index to find the distance from E using the existing cipher class made decryption much more straightforward in general it is a good idea to use code that has already been developed and tested rather than reinventing it have fun coding hello in this module you learned about arrays and how to use arrays to crack a cipher you'll continue to solve problems using arrays they're a powerful programming tool arrays are indexed collections in Java you use brackets to indicate an array variable and arrays can store strings or ints or even storage resources almost anything can be stored in an array the power of an array is that one variable name can represent two, a thousand or a million different values and each value can be accessed separately from the others this access is done by a numeric index a value that starts at zero for the first element in the array just like you did with strings just as a group of mail or post boxes can be found quickly using a number accessing an array element by its numeric index helps in storing and accessing values here's a quick overview of how arrays work in Java in Java arrays are created using new and once an array has been created its size does not change however the values stored in each indexed array cell can change and that's what makes arrays useful and powerful arrays are created using new and brackets are used both to indicate that a variable like names is an array of strings and as part of the syntax in new to specify the number of items in the array you can define arrays of int values in Java just as you can define arrays of string values int values in an array are initialized to zero string and other object values are initialized to null you can assign values to an array using an index and you can access an array to update the contents as well you've used indices to access array elements and loops are typically used to access all the elements in an array here's a loop that starts at the first index zero and loops through the last valid index which is one less than the length of the array list this is typical code that loops over array elements in the loop body the loop control variable is usually used to access each element in the array in this loop the value of k is used to indicate the index at which a word is found in the array parameter list in general this pattern of looping over all elements using indices is very common in solving problems using arrays we used arrays to solve several problems including cracking code you saw how arrays were used in cracking the Caesar cipher method of encrypting messages using frequencies obtained from the message by using indexing and arrays made it possible to crack a Caesar cipher indexing was also used in both encrypting and decrypting as well as in cracking the code it's good to know that the encryption used on the internet for transactions is far more secure than the Caesar cipher scientists and mathematicians don't think that today's internet encryption can be hacked or cracked using brute force or even smart algorithms but you should be careful online with your personal information in any case welcome back now that you are becoming proficient at solving problems in java it is time to talk about some aspects of java that will help you solve larger problems you've probably heard that java is an object oriented language but what exactly does that mean? as the name suggests your code works with objects which you have already been doing with objects from a wide variety of types strings, image resources and CSV records just to name a few one of the important characteristics of objects is that they encapsulate code and data putting them together into one logical unit you have already written methods which are the code in an object however you have not yet made your own fields which describe the data within an object as a familiar example think about strings which you have worked with a lot a string is an object and it encapsulates code and data together for a string the data is the sequence of characters which it represents these characters are logically inside of the string object you can also call a lot of different methods on a string to operate on it and thus on the data inside of it you are familiar with many methods in string such as index of and substring as you learn a bit more about object oriented programming it is useful to be precise with terminology a class defines a type specifically what fields and methods are inside of objects of that type objects are instances of a class you can make many different objects from the same class which you do with new you have seen new before and learned that you use it to make things but now you can be a bit more precise new creates a new instance of an object so why use an object oriented language what is the point of classes and objects a long time ago programming language designers realized that it was helpful for programmers to be able to think in terms of objects as they correspond more naturally to how you think about the world they designed object oriented languages around this idea along with a variety of features which help programmers design and write large programs you're going to learn some basic features of object oriented programming here so that you can create your own classes with both code and data if you continue onwards to take our java programming principles of software design course after this you will learn some more principles and techniques of object oriented programming so let's dive right in welcome back to start learning concepts of object oriented programming it is useful to take an existing piece of code and see how you could write it differently in a more object oriented way here is the code you wrote in a previous lesson to perform encryption with a Caesar cipher as you may recall one of the parameters to this method is the encryption key that you want to use and the first thing you did with this code was to compute the shifted alphabet based on this key here is a different way to write the Caesar cipher class which does exactly the same thing but makes better use of the object oriented nature of java this class has two fields a field is a special kind of variable which lives inside of an object instead of inside of a method here the two fields are the strings for the alphabet and the shifted alphabet and they have been moved outside of the encrypt method notice that they are now declared inside of the class but outside of any method these are now data that are encapsulated in your object when you make a Caesar cipher object it will have these two fields which any code inside the class can refer to by name next notice that there is some code here which looks like a method where we forgot to write the return type and named it the same as the class this code is actually a constructor which means code that gets run to initialize an object when it is created using new this constructor takes in the key as a parameter and initializes the alphabet and shifted alphabet fields using the same approach used before if you look further down in this implementation of the Caesar cipher the encrypt method looks mostly the same as before but it no longer takes the key as a parameter nor does it compute the shifted alphabet but the rest of the code is the same in fact the code uses the alphabet shifted alphabet fields in the object even though these are not declared inside the method this code is allowed to use them because it is inside the object so it can use any fields within the object an illustration helps understand the differences between these two approaches in the old code a Caesar cipher object held no data when you do new Caesar cipher you pass in no arguments and create an object with nothing in it when you call encrypt you pass the message and the key and the method returns the encrypted message in the new way each Caesar cipher object contains a key now when you do new Caesar cipher you pass in the key and the object you have created stores that key inside of itself when you call encrypt on such an object you pass in only the message the key is already in the object and it still returns the same encrypted message as before so if these two implementations produce the same result what are the benefits of an object-oriented approach when you encapsulate the key inside of the cipher object you have one thing that is capable of taking a message and encrypting it that makes a nice logical unit to think about a thing that does a task you do not need to separately track the key and pass it in for small programs such as the ones that may not seem like a big deal however as you solve larger programs with more complex code this design idea will help you immensely now that you have seen an example of these ideas in action the rest of the lesson will teach you about the details of the technique you just saw hi now that you know a bit about object-oriented concepts it's time to go a bit deeper into the idea of fields which are also called instance variables remember that when we redesigned our Caesar cipher to be more object-oriented we created two fields one for the alphabet and one for the shifted alphabet these fields are declared inside of the class but outside of any method they belong to the object and are created when new is called to create an object either in code you write or when you create an object in blue j that's shown on the object workbench these fields are part of the object and exist as long as the object exists what does all this mean? every Caesar cipher object you create has its own alphabet and its own shifted alphabet this is why fields are also called instance variables they act like variables where there is one variable per instance of the object you create what does this really mean? let's look more deeply at fields and instance variables every Caesar cipher object has its own copy of the alphabet and the shifted alphabet you can make different Caesar cipher objects and each will have its own instance variables these variables are specific to the instance if you create three Caesar ciphers with three different keys each has its own copy of these fields with potentially different values one cipher might have a shifted alphabet of QRS for example based on the integer shift value passed to the constructor while another object might have MnO as the shift value and the third object could have Hij as its instance variable set again when the object is constructed to call a method like encrypt on an object you'll typically use a variable name like cc and you'll write cc.encrypt you can also call a method from an object on the blue j object workbench these objects have names too and when you call .encrypt for example the method will use the values of the fields in the object you use to call the method like cc if you wrote cc.encrypt calling encrypt on this Caesar cipher will use its QRS shifted alphabet so when we call .encrypt and provide the message first legion attack east flank as the parameter the QRS alphabet is used to create the encrypted version of the object you see here in the same way calling encrypt on this Caesar cipher object will use its MnO shifted alphabet this is the principle of encapsulation the method and the data are logically inside of the object and the method acts on the data inside the object that it lives in as you can see here calling encrypt uses this shifted alphabet and we see a different encrypted version of the same message the code that starts with MnO is used here finally using this object with Hij as the field will result in a different encrypted message when encrypt is called the encrypt code uses this shifted alphabet of this instance and the encrypt code creates the encrypted message you see here fields or instance variables are very important concepts in designing and using classes since they can be accessed by every method in the class like encrypt that you see here and the constructor as well as you begin to design your own classes and think about the fields and methods to put in these classes here are a few design principles you should keep in mind the first is that a class name should correspond to a noun classes describe things each object you make for a particular class is one of that thing let's think about the classes you've seen so far for this course things, pixels, csv records each of these is a noun it describes a thing a class could be a car for example that's a noun and then the methods and fields would correspond to things that a car can do methods on the other hand are verbs they're what you do too or with an object things like get pixel, set care at or encrypt sometimes method names don't sound like verbs such as substring or index of but these describe actions get a subscreen or find the index of the program has just shortened the name for the car a method might be accelerate or break these are things a car can do for example invoking a method would make the car go faster or stop suddenly fields or instance variables are important class concepts fields are also nouns and should describe things that the class has the string class might have a field for a sequence of characters a sequence of characters is a thing and the string has one of these things similarly an image might have many pixels fields can also be adjectives as they describe the properties of an object for example a pixel might have a field or fields describing its color that could be an adjective giving more information about the properties of the pixel for cars fields could include things a car has nouns like an engine with a certain number of cylinders or wheels of a certain size for cars adjectives might describe the color of the car or the kind of engine the car has or the type of wheel as you get started making more complex classes and provide guidance on fields and methods you should make but think about these design principles as you write your code as you gain more experience you want to start designing classes on your own based on these ideas happy coding acceleration welcome back in this lesson you are going to learn about the two visibility modifiers that you saw in the object oriented version of the Caesar cipher public which was used here and private which was used here there are two other visibility modifiers however, explaining them requires some more advanced concepts so you will only learn about these now when you declare something whether it is a class field method or constructor as public it means that any code anywhere in your program can access that thing code from any class can call a public method read or update a public field use a public constructor making an object and make use of a public class by contrast when you declare something private it tells Java that only code inside of this particular class can see the thing you declared private these two fields are declared private so all the code inside of the Caesar cipher class can read and update them but code outside of the Caesar cipher class is not allowed to access them at all so what happens if you write code that tries to access private fields or methods from outside the class the Java compiler will give you an error like this which says that you are not allowed to do that in general when you get this sort of error it either means that you are improperly using a class especially if it's a class that is part of an existing library or that you have designed a class and made something private that should not be so why do you want to declare things as private after all it seems like it would just be easier to make everything public so you can just access whatever you want wherever you want remember the idea of abstraction the principle that you want to separate the interface from the implementation restricting the visibility of the implementation details helps you enforce abstractions you design you can make the interface of a class all the methods that other classes are supposed to call public and then you can make the implementation details private no other class should know about the implementation details directly and declaring them private lets you enforce that rule in your code in our Caesar Cipher example you want other classes to be able to call encrypt but they should not know about the implementation details such as the fact that you made a variable called shifted alphabet keeping these details private means that you can change them and be sure that no other code relies on the private implementation details as you start to think about designing your own classes there are a few general pieces of guidance to get you started on how to choose public or private first fields are generally part of the implementation of an object so they typically should be private for methods it depends on what the purpose of the method is if the method is part of the interface you want your class to have that is part of the behaviors you want it to provide to other pieces of code you should declare that method public on the other hand some methods are helpers you write them to extract out specific complex tasks which are not meant for other classes to call they just help accomplish the public interface these methods should be private so that only the code in your class can call them for classes you should always declare them public as you become more skilled in Java you will learn some more advanced topics that lead to situations where you might want non-public classes but for now always use public likewise for constructors you should always make these public for now typically constructors are part of the public interface of a class they specify how to make an instance there are situations where non-public constructors are appropriate but as with non-public classes these only come up when you have learned some more advanced topics so for now you should just make constructors public great so now you know what public and private mean why they are useful and some general guidelines for how to use them in your own classes thank you welcome the last object oriented concept that you are going to learn in this lesson is constructors recall that the object oriented Caesar Cypher we wrote had a constructor which took the key as a parameter and initialized the fields on the object when you want to write a constructor there are certain rules for declaring it the first is that its name has to be exactly the name of the class it is in here the constructors name must be Caesar Cypher because it's in a class called Caesar Cypher the second rule is that the constructor has no return type not even void normally you would write a methods return type here between public and the name of the method but for a constructor you do not need to write anything between them like a normal method a constructor has its parameter list and parentheses you can make constructors with any number and types of parameters you want you can have no parameters like a constructor or several here we have one parameter type int the key finally a constructor has a body like a normal method you can write whatever code you want in the constructor's body to specify how to initialize the object constructors are not quite like normal methods however you do not call them directly instead they are automatically as part of newing an object when you create a new object the constructor is invoked to initialize that object immediately after the new object is made this is one of the great benefits of constructors they allow you to specify how an object should be initialized and you can be sure that code will always be run for every object as soon as it is made you do not have to worry about bugs in your code from forgetting to call some initialization code what happens if you do not write a constructor for your class whenever you write no constructors as you have been doing up to this point the Java compiler will provide a default constructor for you the constructor that the compiler provides looks like this it is public which you should recall means any piece of code may use it to initialize an object it takes no argument so you would pass no arguments when you create new objects and it does not do anything there is no special initialization for any object now that you know the rules about constructors let's see how a constructor works let's suppose you had a line of code somewhere else in your program like this CaesarCypher cc gets new CaesarCypher 22 this line of code sjava to make a new instance of the CaesarCypher class and initialize it by passing 22 to your constructor let's see what this does by starting the first thing it does is create a new variable cc then it creates a new instance of the CaesarCypher class that means you have a new object with its own copy of the fields of that class alphabet and shifted alphabet then it calls the constructor passing 22 for the key once inside the constructor Java begins executing the code you wrote to initialize the object this code initializes alphabet then initializes shifted alphabet having finished the constructor Java returns back to where you are creating the object the key was just a parameter to the constructor so it only exists during that call however the fields are part of the object so they continue to exist in it and then finishes the assignment statement by initializing the cc variable to the newly created object great now you know the basic rules of how to write constructors what they do and how they are used to initialize objects thank you in this lesson you learn some of the basic concepts of object oriented programming you learned about encapsulation the idea of putting code and data together in an object such as the methods of the object act on the data that's inside the same object you learned about fields which are also called instance variables they let you declare data that should be inside of objects and you learn about visibility modifiers private and public which let you expose or restrict access to fields and methods so you can enforce abstractions and provide the interface you want finally you learn about constructors which let you write code specifying how to initialize the objects you create when you do object oriented programming it's called OO programming giving you 007 license to code hello welcome to this module in which we'll use the idea of creating random hopefully entertaining stories to motivate and explore important Java concepts here's an example of a randomly generated story we'll use a couple of these stories to introduce our Java concept let's look at this first story my name is Albert and I live in France one day I'd like to travel to Mexico because I've never seen a tiger and I read that they have gigantic slippery ones there however because 55 orange gigantic wheel barrows make it difficult to travel I may have to travel to Ecuador instead that would be okay but it might take 305 minutes to get there we'll look at another story with the same pattern perhaps you can see some of the similarities to the previous story my name is Vivian and I live in India one day I'd like to travel to the United States because I've never seen a pangolin and I read that they have furious and funny ones there however because 95 purple slippery houses make it difficult to travel I may have to travel to China instead that would be okay but might take 445 hours to get there there are many similarities in these stories which isn't surprising since they were generated by a common template you'll be exploring and modifying the Java program that reads the story template finds replacement words where those are needed and generates a story the similarities in these stories are captured in parts of this template that are replaced with words chosen randomly from lists of words of different types for example your program might choose a name like Albert or Vivian and choose a country like India, France, China the United States and more animals chosen could include tigers pangolins and whatever dragons you might imagine for example your program can choose nouns and as you'll see the choices can make for interesting stories for other learners as you explore these new Java concepts let's get started you're going to think about the design and implementation of a class for generating random stories this problem is a bit larger than the ones you have solved so far so you will need to think about the methods that you need and how they work together the same ideas that you are used to will still guide you through this you will not only need to think about the algorithm but also the data involved and how to represent it data and methods guide you through designing a class as always you want to work the problem by hand before you do anything else you could use a story template like this with pencil and paper by yourself or with friends this template is similar to one you've seen before it uses words and labels to create an interesting story let's look at how you might create a story from this template this story template starts out my name is then the template requires a name remember that a label in angle brackets requires an attachment here we want a name so I'll pick my own drew after that it goes my job is to and then we need a verb I'll let my friend pick a verb ride and a noun dinosaurs if you have ever seen one of these you know that this job is really adjective adjective fluffy yep that sounds like a great job to do when I retire ride fluffy dinosaurs now if you started developing the algorithm for this random story program you might end up with something along these lines we read each word saw if it had angle brackets around it and then if it did picked a random word from that category there are no angle brackets we just kept the word but as always we need to be careful as it is easily easy to mentally gloss over things that happen naturally for you in particular we picked random words for each category but how did we do that more importantly for this problem how would you make a computer program do that if I ask you to think of an animal you can just do it and it may seem hard to explain an algorithm to think of an animal as a human you just know what animals are you implicitly have a mental list of animals that you read and can just name one of them it may not be truly random but maybe you just saw a cat recently or were thinking about your pet dog but picking some animal is easy for a computer however you need an algorithm and it needs to have data to operate on the program will need an explicit list of animals to choose from which could be written into the program source code or read from a file or from the internet so if you think about these steps there was a step that was implicit for you but needs to be explicit for the computer making a list of animals more generally making a list for each template label not just for animals you should also think about this step reading each word in the story template where did these words come from this should be some sort of input like a file or a website your program will need to read that file or website which makes use of familiar classes like file resource and url resource you might also notice that some of these steps are a bit complicated making a list of words in each category might require more than a few lines of code though using a file resource or url resource will help picking a random word might also require some planning and programming it is perfectly fine for your algorithm and thus your program to end up with complicated steps these steps may be names of other methods you will need to write for example you might write a method to pick a random word from a category suppose the method were named pick random word if you have this method the corresponding step in the algorithm is now just one line of code you just call the method pick random word and it does the work for you working through the algorithm development helps you figure out what methods to write as you write each of these methods you may in turn find you need yet more methods don't let this worry you as you break the large program down into many smaller problems the methods you find will often be simpler than the ones you started with to make the list of words you will need some variable to hold the data but how should you store this data what type are each of these lists of words for template labels you have seen two types that would work an array of strings and a storage resource but neither one is ideal for this problem each of these structures has benefits and drawbacks the storage resource class is relatively simple to use your code can add elements to a storage resource without knowing how many elements are going to be added that is without knowing the number of colors or nouns or names that will be added accessing storage resource elements requires using a for loop to iterate over all of them this will make choosing an element at random a little tricky to code on the other hand string arrays have almost the opposite benefits and drawback it's simple to choose an element at random pick a random index less than the size of the array and return that element element at index 2 or 7 however declaring an array variable requires knowing how many elements would be stored that makes arrays not always the right choice we could use either a storage resource or a string array to implement this program but we'll see that a new concept the array list combines the best aspects of both arrays and storage resources happy coding welcome back in this lesson a new class a very important part of java that combines essential features of the storage resource class and an array in fact this new class the array list is the foundation of how the storage resource class is implemented to motivate the problem think about counting how many different words there are in a file or a web page but the same problem is encountered in finding the number of unique IP addresses that visit a website in a day a key part of charging the company for online advertising you've seen how to count the number of each type of nucleotide in digitized DNA and you used an array to count the number of occurrences of each alphabetic character and cracking a Caesar cipher these are first steps in counting how many times each word occurs in a file how many times the occurs or cat or the word albatross one important part of solving that problem is finding how many different words there are so that the counts as one word rather than 573 if it occurs that many times in a document if the image below in the image below there are only three different digits four six and seven even though there are hundreds of digits shown we'll look at storage resource as a way of solving this problem first we'll count the total number of words in a file or web page the class storage resource makes this easy to count the words in a file or web page you'll iterate over either a file resource or a url resource as you can see in the highlighted code iterating over a file or web page using these resources uses nearly identical code and here you see that simply calling the dot add method adds each word to the storage resource instance variable my words when done the dot size method will provide the total number of words read the method dot get count returns this number and called again using instance variable my words which was initialized in the constructor and added to in the method read words as you'll see it's easy to use storage resource to count the number of different words not simply the total number of words this code can be easily modified to find the number of unique or different words in a file or web page the field my words whose type is storage resource can store all the words as you've just seen and as shown in the code here the dot add method adds every string read to my words but it's simple to guard the call to dot add with code that only calls dot add when the word is seen for the first time when it hasn't been stored in my words yet the dot contains method returns a boyan value in the code here this value is used to ensure that the dot add method is called only when the storage resource object my words does not contain a word however the storage resource class is not a good choice for choosing elements at random a key part of the storytelling code GLADLibs we're working on to choose an element at random we must use the interval interface that storage resource provides this means we use a loop to access every element in the storage resource object my words in the loop here we'd really like to iterate as many times as the value stored in variable choice because we want to choose a random element of the storage resource the code here returns a random string when the value of counter reaches zero as you can see in the code the value of choice must reach zero since it starts as a value less than the size of my words and is decremented by one each time through the loop however the Java compiler analyzes a loop with an if statement and doesn't know that it's possible the if statement must be true at some point the compiler indicates it's an error to be missing a return statement after the loop even though that part of the code is never reached it would be simpler to code and much much faster to use a string array to get a random element as shown in the code here we simply generate a random integer and use that as an index into the array unfortunately we must specify the capacity of an array when we declare it arrays do not grow in the way that a storage resource object grows the class array list provides a solution and combines the best feature of storage resource and arrays the class array list is from the java.util package the same package that contains the random class we've used an array list expands its capacity when its .add method is called just like a storage resource object an array list also provides access via indexing so that the 0th or 101st element can be accessed without iterating over all elements just like an array the storage resource class uses an array list internally in fact it's simply a little easier to use than an array list but as you become more experienced you're now able to use an array list which stores any kind of object the basic syntax of the array list class is shown here but you'll see it used in a coding example in the next lesson to declare an array list variable you must include the type of object stored in the array list using the angle brackets as shown here like any object creating one requires calling new and providing the class name as a constructor to create and initialize the object that's shown here and assign to the variable words then you can call the dot add method to add strings to the array list object and you can call the dot get method to access a particular element via indexing just like the bracket notation used with an array but with an array list you'll use the dot get method the dot set method can change or set the element at a particular index in the array list object here the first or 0th index element is set to the string we'll see more examples as we go through a coding example with the array list the array list class is more powerful than storage resource and can sort any kind of object it will be an essential class in solving many problems with Java thank you we're going to show an example of how the array list structure works we'll show code that keeps track of how many times each word in a file occurs the file could be a URL could be a large file like text from Shakespeare's Romeo and Juliet or like the sayings of Confucius we don't know how many different words there are before we start reading the file so we can't easily use an array instead we'll use two array list structures one that holds strings and one that holds integer values we'll need integer instead of int we'll explain why the case value of the array list whose name is myfreaks we'll store an integer the number of occurrences of the case value in the instance variable array list mywords the names are names of instance variables that we'll see in the code as shown here we see two occurrences of the with a string and an integer stored in the location with index zero in the two different array lists similarly there are three occurrences of the word dog and one occurrence of the word green let's write code we've got a class word frequencies I'm going to open it up and take a look to see how it works and one way we can easily see how it works before we look at the code is simply to run it so I've created an instance of it on my object workbench and I'm going to right click and run the tester method I'm going to open the file Romeo dot text the entire text of Shakespeare's Romeo and Juliet and I can see that there are 5,895 unique words what I'd like to do with array lists is count how many times each word occurs so I'll find out how many times the occurs, Montague Capulet, all the words from Romeo and Juliet just going through this code very quickly to highlight the key features before I add the counting I have an instance variable mywords which is initialized to be able to hold strings in the constructor in tester I call the function find unique which creates a file resource and because it had no parameters that allowed me to open up any file I wanted it loops over every word converts each word to lower case if the word has never been seen before which I find out by using the array list method dot index of which is a method whose same name we've seen in the string class if it has never been seen I add it to my words now what I want to do is have a parallel array I will call it my freaks for my frequencies so I need another instance variable this has to store integer values and unfortunately with array lists in Java I must use integer you've seen this before with integer dot parsant similar to the class double dot parse double I'll explain this as I go to store integers I'm going to create place for it I'm going to initialize that in the constructor to a new array list of integer values and then if I've never seen the word before that means this is the first occurrence so I'm going to add to the end of the array list the value one this is just like the idea of the first time you see a word it's occurred once but if I've seen it before this is in my else statement if I've seen it before I know where it is because index tells me so what I'm going to do is find the value that's in my freaks this is the number of times it's already occurred I can get this value using the get method and then I can set the value in my freaks to value plus one so let me make clear what I've just done there I've accessed the value at the location specified by index which was returned to me by the dot index of method and the idea here is that if the word the occurs in location 500 of my words the frequency or number of times that occurs is in location 500 of my freaks they match up exactly all I hope compile cannot find my freaks so I've looked up here and I can see my freaks my freaks my freaks that's because I forgot to say set the name of the method to set the value is set now I'm ready to run it but I haven't printed any values after I've printed them after I've tested them all and found them I will literally print every single word how many times it occurs so I'm going to use the standard for loop that loops over array values or array list values size is the method that tells me how many there are and I'm going to print the number of occurrences which is in my freaks a tab character and then the word itself that compiled I'll create a new one by right clicking and creating a new object and then I'm going to run the tester method on this object by right clicking there's the right click run tester let's count Romeo and you can see how many times different words have occurred in Romeo for example the has occurred 677 times Romeo 48 and Juliet 23 another word that occurs often is Juliet with a period I haven't taken into account punctuation at all and from occurs 86 times let's make sure that part of the code is clear I've accessed the value in my freaks and stored it in the int value and then I've set the value at index to value plus one because array lists use integer rather than int this is a two step process find the value and then store the value that's what we need to do with integer which would be a little bit different than what we had with int we'll see that in a later video let's look down here myfreaks.getk returns the case value of the frequency array list and my words.getk returns the corresponding word the 23rd frequency is the number of times the 23rd word has occurred now that you know a little more about array lists let's write some code you've seen array lists at work and how useful they are arrays are extremely useful too so we're going to go through a quick walkthrough of code to show where arrays don't work so well creating an array is easier in terms of syntax than creating an array list far fewer characters to type for example it's much easier to access values in an array since a sub k can work to either read from an array location or write to the array location given the index k in contrast with an array list you use .get and .set for reading and writing respectively with int values arrays often have more benefits in some ways than array lists do although most conversions between int and integer happen automatically occasionally these conversions can lead to hard to find bugs if you don't have a thorough understanding of how the int and integer conversion works it's easy to increment the value in an array given the index however in an array list you have to call .get and .set since the code to simply increment the result returned by .get does not work however arrays don't grow and that's a really large concern let's write some code now we want to find the number of unique words in a file but we want to use or at least try to use an array so that's what we're going to do here I've started this program the class is called words with arrays and the first problem we run into is we want to read in all the words from a file but we don't know how many words there are in the file so we don't know how big to make our array so we can't really use an array for that so we'll use a storage resource for that part of the program so I've already started here we've got my words which is a storage resource we have we did the storage resource in our constructor and then we have read words which is going to read all the words from the file and put it into our storage resource my words notice also going to add them in as lowercase so all the words have been lowercase we have a method called contains where we're going to pass in an array of type string and a word and we want to know is that word in our array so what this what contains is going to do is it's going to look through the array and see if the word that we're passing in matches anything and if it does it returns true if we go through the whole thing and we don't see it anywhere we're going to return false now we have number of unique words and the first thing we've done is we're going to create an array here to store all the words that are unique so you can see I've started that here I've gone ahead and putting words is an array of type string I have to create a new one so I do that and then I get to the part about the size and I don't know how big to make it so I don't know how many unique words they're going to be so the only thing I can do is just make it as big as my storage resource so I made the size my words that size that's the only safe thing to do because all the words could be unique now we're going to iterate over my words and we're going to check and see for each one is it in words which is going to be just the unique words so is it already in there if it's not in there we found a new unique word and we're going to put it in there so you can see here we're going to add it in and then we're also keeping track of how many unique words we have because this method is going to return the number of unique words and so every time we find a new unique word we're going to add one to that count the next thing we have is we have a tester method so we can test this out right down here so we're just going to call it and test it out so let's compile this and create our object and then we'll call the tester class and we have to pick a file so I'm going to pick Confucius.text oh dear we got an error so it says down here we got a null pointer exception also over here this is our output here you can see that it did read in all the words from the file it says 4542 words read in but you can see also it got a pointer exception and you can see where that exception is it says in the tester on line 45 and then in number of unique words there's a line for that and then contains on line 23 and so that top one is probably where the error is so if we click on that it goes to where this error is highlighted here and you can see our errors in contains so what is the problem well the problem is we're using this array to put all the unique words but we don't have any in there yet and then we're actually iterating over the whole thing which is all empty it's initialized to null and so we're checking does a value that's null equal to word and that's why it crashed because you can't compare null to a string so we need to fix this so what we really want to do is we want to keep track of how many unique words we have in there because we just want to check the unique words that are actually in there that we've actually put in there and so what we'll have to do in order to fix that is first of all we'll have to add another parameter so we know how many words we've actually put in there so I'm going to add a parameter here called number and we actually have to give it a type so it's going to be an integer hit number and when we iterate we just want to iterate over the words we put in there so we want to replace list.length instead of looking at the whole gigantic array we want to look at just the ones that are already in there so number tells us how many are currently in there now we also have to fix where we call contains which is down here and we have to put a value here that value remember we're keeping track of how many words we put in there that are unique words that is the variable num stored so we'll pass num stored here let's compile that and see if that works we got no syntax errors let's try and run it and it works so you can see here we've had a lot of trouble trying to use an array this problem really should use an array list because here what's happened is there are 34,000 words of which only the unique words are 6,558 so that means the array we're using is a size 34,000 but there's only 6,500 unique words so we have a lot of extra space that's another reason why an array list would be better for this problem alright, happy coding Hi, as you get ready to start the program to create stories from templates let's summarize what you've learned about the array list class in Java array lists are like arrays both are indexable collections allowing you to access elements with an integer index array lists can grow as elements are added to it this means you don't need to know in advance how much space to allocate for an array list like you did for an array like arrays and individual string elements indexing in an array list with 0 it takes the same amount of time to access the first element of an array list or an array as it does to access the 10,000th element it might help you to think of array lists as a collection of boxes each addressable with a number to use the array list class you must import it from the java.util package you can import just the array list class or use the asterisk and import java.util.star to gain access to all of the classes in the package like the random class when you create an array list you specify the type of element stored in the array list using the angle bracket syntax the java uses for generic or general elements here you see an array list that can contain string objects but array lists can also store integer objects too though the list must store integers or strings not the same type in one list the integer class allows int values like 0 or 57 or negative 352 to be stored in the array list the integer class automatically converts an int value like 57 into a value stored as an integer object you've seen several methods used in array list objects the dot add method adds an element to the end of an array list the array list grows as needed the dot size method returns an element stored in an array list typically this is the number of elements added via dot add you can write code to access individual elements with an integer index using the dot get method and you can change the value stored in a specific index using the dot set method array lists are typically processed and accessed using loops here's a typical indexing loop which processes each element of an array list these four loops typically started 0 and looped to less than the size of an array list which is exactly dot size elements within the loop each array element is accessed using the dot get method and the loop index variable when accessing array elements in a loop like this do not call dot add or dot remove which will change the size as the loop iterates typically causing a problem in your algorithm because you will either skip elements or access invalid elements you can also access the elements of an array with an iterable loop the same kind of loop we used with the edu.duke iterable classes in an iterable loop your code indicates the type of value stored in the array list your loop takes on each value stored in the array list one at a time just as with the file resource or image resource classes you can use this kind of loop when you don't need the index of each array list element but just the element itself just as with an indexing loop do not call dot add or dot remove with an iterable loop in this case Java will generate a runtime error if you try array lists are a very useful tool when used properly now that you've learned about array lists it is time to work with the gladlib.java program you'll learn how it works to create stories with new kinds of data you'll need to understand the program so that you can modify it as a programmer or software engineer sometimes you'll be creating programs from scratch and sometimes you'll be modifying programs enhancing them, making them more robust and more in this program there are many pieces many methods each of which you could write yourself this means you'll be able to understand each method and be able to modify since you'll start with a working program you'll be able to make improvements and additions while testing your program to see that it continues to work correctly you'll be able to understand class design, method design and the limitations of the gladlib.java program you start with as you make improvements to the code as you work to understand these aspects of software design and engineering you can tell stories about it as you modify the program you'll be creating reusable program components and reusable ideas that you'll be able to use as you develop software experience and expertise we'll take a tour of the program before you begin to make enhancements and improvements as with all Java classes a constructor will initialize a gladlib object the constructor will create the array list instance variables that hold replacements for nouns colors and more it will also create the java.util.random object used for choosing a replacement at random the general control flow of the program after initialization will be to read a story template and process each word if the word is a label like country or time frame indicated by angle brackets the code will find a replacement at random after the story is created the program will print it let's look at these pieces in more detail first let's take a quick look at creating stories from the template you just saw there is one public method in the class make story calling this method will use a template to generate a story as we'll see when we look more closely at the code in Kenya a long time ago nearly 245 decades ago there lived a pink funny tiger it's so love to sing and dance but there was an angry gigantic lion named Lance that scared it so much in Ecuador a long time ago nearly 105 months ago there lived a jovial yellow polar bear it's so love to sing and dance but there was a furious angry rabbit named Albert that scared it so much reading words from the template and printing the story will be code you likely won't need to modify calling make story will read a template from a file or URL and loop over each word in the template if the word is a label with angle brackets it will be replaced finding labels is a straightforward use of dot index of and dot substring in the private method process word we use these methods to ensure that punctuation or letters before and after the angle brackets are preserved printing the story will display the final result in the console window of blue jay or a different programming environment the private method print out has a parameter to specify the line width so you can create a story and use 80 or 40 characters or any other number you could modify the print out method to write the story to a file to using the edu dot duke dot file resource class changing the gladlib dot java program requires understanding how the ArrayList instance variables are used there is one ArrayList for each possible label in a story template like noun or color and instance variables should be named appropriately so that programmers will be able to easily understand their use in reading and modifying code as with all instance variables or fields they'll be created and initialized when the gladlib constructor is called either using new or from within blue jay when you create an object the program could use all the fields when replacing words as part of telling a story the field adjective list will hold replacements for the label adjective the field noun list will be for nouns and the instance variable color list will hold colors to be chosen at random each field holds replacements for the label that's part of its name this is a convention in the program not required by java but following the convention will make it simpler to create a new instance variable like verb list for a new label we'll look at each use of these fields one is finding a substitute for a label like color based on the word that's part of a label like color or noun the private method get substitute will access the appropriate instance variable to find a random replacement for example if the label is color then a replacement will be chosen from the field color list if the label is noun then noun list is used you can see in the method get substitute that adding a new label will require adding a new if statement to access the appropriate array list a value is chosen at random using a private method random from both random from and get substitute are private they're called as a result of calling the public make story method when get substitute calls random from get substitute will always pass one of the instance variables you made such as adjective list noun list etc as the value for the parameter source initializing the array list is straightforward but you'll need to understand it to create a new field for a new label all the array list must be created and initialized when the constructor is called the constructor will call a private helper method initialize from the source for the data for colors nouns and so on could be a url or a file calling initialize from will result in reading files or urls to store strings in each array list if the parameter to initialize from begins with http then eventually a url resource object will be used to read data otherwise a file resource object will be used you can't see that in the code here because the parameter source is simply passed to the helper method read it whether reading code is located let's summarize how the instance variables for replacing labels are used you'll need to understand this to enhance the program by adding a new label for example to create a label like verb you'll need a new instance variable you'll name it appropriately like verb list you'll need to modify code in two methods with the addition of verb list you'll modify the method initialize from a private method called from the constructor you'll modify the code in the method get substitute also a private method called by the public story via the private methods from template and process word the program documentation should include information like this to help you the software engineer make modifications and enhancements have fun telling stories and writing code we'll outline some of the design features of the gladlib class and talk about software design in general adding a new label like verb means modifying gladlib.java implementation in several different places and requires following a naming convention used in the class you'll need to create a new instance variable for the array list that stores examples of verbs like run think or swim you'll need to initialize the array list using the method initialize from called from the constructor you'll need to get a random verb when needed by modifying the code in the method get substitute you should follow the naming conventions used in the class where the label noun is associated with the instance variable noun list just as the label country is associated with country list this means you should use the name verb list for a field that's the array list that stores strings that are verbs as you modify and extend programs and classes you'll gain experience with many kinds of programming and design you'll gain experience that will help you make good decisions when programming and create good designs but you'll see that sometimes experience comes from bad judgment or bad designs that nevertheless allow you to reason about the tradeoffs in doing things in more than one way so you might realize that a choice that works can lead to code that's not easy to maintain some software designs are called brittle meaning that the software or design breaks when you try to extend it or use it in ways that is a little bit different than what's initially intended flexible designs on the other hand are able are better able to cope with changes in the software when you learn about object oriented design you might come across a principle that that says code should be open for extension but closed for modification the open close principle the idea is that you should be able to extend software without extensive modifications to the existing code that's more possible with object oriented design ideas that we won't be able to cover in this course but you can still create designs that are more open than others you'll be able to understand a better design after working with this design and implementation the glad lib class does have some good features it's relatively easy to understand each method and the code works it is possible to extend the code as you'll see even if the extension requires changing the code in several places you'll be able to create a better design after learning some new Java concepts will introduce soon but those concepts will be more clear because you'll have this experience with this code in class and you'll understand why the new Java features can make the code simpler the new Java features will let you keep changes in one place rather than being sprinkled across three parts of the class the new features will also minimize the duplicated code that's in the current implementation have fun making new stories hi we're going to walk through the process of adding a new label to the glad lib.java program glad lib class a new label might be something like verb which would go along with noun, color and the other labels we have to do that you've seen part of that process outlined in one of the previous lessons what I'm going to do here is show you that to add a new label we need a new template file so I've taken the standard template file that we had before which was called madtemplate.txt and I've replaced sing and dance the two things that a creature would do in the stories we generated with verb and verb so I'm adding the label verb to replace sing and dance in the original story I've still called this madtemplate.txt you'll use whatever text editing software you have which is usually something like text edit on a Mac or notepad plus plus on a Windows machine text editor now I'm going to take the glad lib program that we've used before and I'm going to walk through the locations where we need to make changes I need to create a new array list to store my verbs I'm going to go along with the same conventions that have been used before and make sure that I use verb list to go along with adjective list, noun list, etc as I scroll through my source code I see that these array lists are initialized from source method it's called from our classes constructor I need to initialize verb list and it's going to be initialized by calling readit passing the source which is either a place on my computer where it can access files or a URL this is verb.txt I'm going to also have to make one other addition because now my label might be an adjective or an animal or a name but I'm going to need to copy and paste these so that in addition to using time frame I'm going to use verb so I'll replace that make sure I get the syntax right if the label is verb I'm going to replace the label from verb list I'm going to compile my program I know that in this program it's reading the template from madtemplate.txt which I've modified to include the angle bracket verb angle bracket label I'm going to create a new gladlib on my object workbench and then I'm going to run the make story method it's so love to unknown and unknown there was a funny slippery line named Vivian that scared it so much so if we go back and look at our source code we'll see that it must have found verb but failed to replace the verb with the objects that it was replaced with in this case I've got verb list here my instance variable I've initialized it from verb here I've made sure that if oh look if label dot equals verb t I had a misprint when I spelled it now it's got verb so I'm going to go back and recreate that create my gladlib object open it up and make a story hey it's so love to think and contemplate but there was a gigantic gigantic tiger let me run that one more time and see if I get something besides think and contemplate it's so love to ride and surrender one more time and I'll be convinced that it's actually reading verbs it's so loved to surrender and surrender we've done nothing to make sure that it can use the same word or verb more than once so just to remind you what we've done I've used this location to change when the label was found I've made sure that the label was read from the source and I used verb list here which is the name of my instance variable here verb list using the same naming conventions we've used in all the other programs I compiled my program I tested it I created a new text file to hold verbs and a new text file for the template happy verbing hi it's time to learn about a new way of structuring data this also helps in structuring classes you've seen some of the concerns with the design of the gladlib class the design has flaws but you still were able to add a new label and create new stories so the design works larger programs with the same flaws might be harder to modify we've also seen code to count nucleotides in a strand of DNA and letter frequencies to break a Caesar cipher we extended this idea to counting word frequencies in any file with two parallel arrays to count nucleotides we used four int variables to count each time a c, g, t, or a occurred to count letter frequencies we used an array of 26 int values one for each letter a through z to count words you saw code that used two parallel array lists these contained as many values as there are different words being counted now you'll see how to use the concept of parallel array list to help understand the java.util. hash map class using a hash map will make it much easier to modify the gladlib class the hash map class is also very efficient compared to using two parallel array list we'll explain the hash map class using the parallel array list class to help you've had a chance to see this code that uses two array lists to count how many times each different word occurs in a file the method.index of finds the location of a word in the array list of strings named my words if the value returned by the .index of is minus 1 then the string read from the file hasn't been seen before and it's stored in my words with a corresponding count of 1 in my freaks if the string has been seen before the integer value of my freaks at the same index is incremented by 1 this code works but using a hash map will make the code much faster and the hash map class will make it easier to modify the gladlib class as well. let's work to understand the concepts in the hash map class a hash map object associates keys with values in many languages this is called a map you might think of the key or legend in a map as you can see here the key helps in understanding the map you can look up the color in the key or legend and understand what the color means in the map in programming the concept is more graphical than geographical the word map has meanings in both math and geography the key is an element in a domain that's being mapped to a value in the range in math a function is sometimes called a mapping and this expresses the ideas in the hash map class as well in a hash map you look up a key and you get the value associated with the key in an example illustrated here we are counting word frequencies the word rainbow occurs 41 times so the integer value 41 is associated with the string key rainbow in java the value returned by the call map.getRainbow is 41 the value returned by map.getTruth is 17 indicating the string truth occurs 17 times if football occurs 23 times then 23 is the value associated with the key football and is returned by the call map.getFootball wonderful occurs 23 times in our hypothetical example the keys in a map are unique but it's possible to have the same value associated with different keys to use a hash map you'll need to understand the operations it supports we'll use one hash map to replace the two parallel array list here's the code that uses one array list of strings and one array list of integers with the dot index of function used for associating integer values with strings the hash map code also associates an integer value with the string key the key in the hash map will be a string the value is an integer that's the number of times the string occurs in a file being processed to define a hash map variable you'll need to specify the type of both the key and the value when calling new you must specify the key and the value types as well the key is the first type and the value is the second type the code will determine if the key has never been seen before whether it's stored in the hash map or not the method dot contains key on the map object returns a bullion indicating whether that key is in the map if it's not in the map the value one is put in the map with the key using the map method dot put if the string is a key in the map meaning the string being processed has been seen before the value associated with the key is found using the dot get method the value is stored for the key by calling map dot put with an updated value of one more in addition to accessing individual map keys and values you'll also need to access all the keys and values in a map printing the strings and frequencies when parallel arrays are used requires a typical indexing for loop with the index used to access both strings and integers as shown here printing all the key value pairs in a map requires looping over all the keys and getting the value associated with each key the method dot keyset returns an interval you use to access each key in a loop this is similar to using dot words or dot lines with a file resource or url resource to access elements or the dot data method to access each string in a storage resource iterating over the keyset and calling get for each key allows you to print the contents of a map maps will allow you to modify the gladlib class more easily but maps are incredibly fast too when files are large efficiency matters more than when they're small this concept of using efficient structures or code is important computers are so fast that simple concepts lead to code that's fast enough given how fast computers are today as you can see in the first row of the table counting how many times each word occurs in Shakespeare's Julia Caesar is fast enough on a laptop computer even using the parallel array list with a bigger file like the sayings of Confucius the code for array list is still under half a second whereas the hash map code is incredibly fast for the novel the scarlet letter there are many more unique words and total words but the hash map code is still roughly 10 times faster than the array list code for a large file like the king james version of the bible with over 800,000 words and 32,000 different words the array list code takes more than 20 seconds while the hash map code is under half a second that's more than 40 times faster looking up keys in a map takes time that's independent of the number of keys roughly speaking this means getting the value for a key in a map with a million keys takes the same time as looking up the value associated with the key in a map of just 10 keys with an array list you might have to look at all million elements in the array list so hash map is incredibly fast in later courses you'll study how the hash map class can be so fast in this course you'll use hash maps and many examples happy coding hi today we're going to write some code to find out how many times every word in a file occurs so we'll find how many times the word the occurs how many times the word wonderful occurs and we're going to start with a working program let me show you very quickly what that looks like this count words function that we have here this method count words simply creates a file resource loops over all the words in it and counts how many total words there are if I take this program make an object on my workbench right click and call count words it will ask me for to enter the name of a file I will use confucius and it will tell me that there are 34,582 words in the works of confucius rather than knowing the total number of words I'd like to find out how many times each individual word occurs to do that I'm going to use a map so I'm going to need to add another local variable here a hash map I'm going to map strings to integers each string will be a word that occurs in my file that's the key in this map and the value in the map is how many times that word occurs so I need to create a new one I'm allowed to make a hash map because I've imported java.util that's the package in which we find hash map now as I read the words rather than incrementing the total I'm going to ask have we seen w in the map before so I'm going to ask whether the key set associated with the map contains w the word I'm looking for and if it does I've seen the word before I'm going to put the value w back in the map it's already there I'm going to get the number of times that word occurs and add once I'll go over again what I'm doing in a minute if I've never seen the word before then what I'm going to do is put it in my map with the number one because the word will have occurred one time let me compile and make sure I've got that right so let me go over very quickly what I've just done I've looked to see does this word w that I just read and converted to lowercase is it in my key set have I ever seen it before if I have seen it before let me get the integer value associated with it that's the number of times it's already occurred add one to that and put that in the map as a key value pair with the word and an incremented count if I've never seen it before this is the first time I've now added all the values to the map and I'd like to print them out the way I'm going to print them out is to loop over the key set that's all the words that are keys in my map and I'm going to get the value associated with it that's the number of occurrences that's the value associated with my word and if that value is big and here I'm going to say if it's bigger than 500 I'm going to print the words that occur a lot which is, so I'm going to print the number of occurrences a tap character and the word itself so I'm looping over the key set that allows me to find every key I'm getting the value associated with that key in the map and if that's big I'm going to print it big in this case is bigger than 500 so I'm going to right click and create a new object and I'm going to invoke the method count words which before used to count the total number of words and now it's going to count all the individual words in Confucius and here we can see that in the file Confucius there are not that many words that occur more than 500 times the occurs more than 2,000 times and occurs 762 if I want to see a little more in the way of words I'll say 200 just to make sure that I'm getting something more and I'll make a new version of this that sits on the object workbench I'll right click and call count words I'll once again count Confucius and you can see there are more words master occurs 484 times Confucius was the master of many things you are now going to be the master of programming as you use maps to map keys to values to solve many problems have fun coding in this lesson we'll look at how we can use the hash map class to make our glad lib class easier to extend have fewer lines of code and be a good example of how to become a more skilled and experienced software designer as a reminder extending the class to use a new label like angle bracket verb angle bracket requires modifying the code in three places you'll need to create an array list instance variable initialize it properly and use it as the source for your random replacements you should also follow a convention of using field names like verb list for the label verb this makes it difficult to use text files or urls for the source of word replacements unless all such sources follow the same conventions such as using a file name noun.txt for the label noun or the field noun list or color.txt for color list and the field color let's take a look at the concepts behind these requirements for extending the glad lib class to look at a new way of structuring data in classes each label is associated with an array list instance variable you see the label noun associated with noun list the label color with color list and so on these named instance variables make for a poor design adding a new label like verb requires defining an instance variable by name initializing it by name and using it by name that's three places in which the program must be modified instead we'll use a hash map to help create a better and more flexible design the hash map will allow us to label or align the label to an array list without ever having to name the array list itself given a label the code will look up or find the associated array list in the hash map structure as you've seen this is similar to how index of works for finding a value in an array list or a character in a string getting the value associated with a label will return an array list let's take a closer look at how using a hash map creates a more flexible design one hash map will replace seven or more instance variables the hash map will reference as many array lists as needed rather than us having to define separate instance variables and following a naming convention as you see here the code will use a single instance variable a hash map named my map this will associate an array list with each label so the keys in the map are strings the label in the original program the value associated with each key is the array list of replacement words for that label this means that to add a new label and a new array list we don't have to add a new instance variable we simply need to store new values in the single hash map instance named my map let's look at how the method get substitute works with a hash map in the original program a list of if statements was used to identify the instance variable associated with a particular label the naming convention of using country list for country allows a programmer to extend the code but there will always be as many if statements in the get substitute method as there are labels and instance variables the last if statement is different you can see that the label angle bracket number generates a random number instead of finding one in a list of numbers when using a hash map the get substitute method is much more simple the hash map associates a label with the array list of replacements the array list for a label is accessed using the hash map dot get method to get the array list associated with a string label like country or noun or color adding a new label doesn't require modifying this method at all and that's an example of the open close principle that we talked about in a previous video using a hash map makes for a more flexible class that's easier to extend but there's room for even more improvements using hash maps again the original program reads a file or URL to store information in each named instance variable the array list of replacement values for that label that was done in a sequence of statements that call the helper method read it the hash map version still associates a label with a file name and that file name must be specified in the program but the code is different because we use a loop to associate each label with a file name this wasn't possible in the original program note the private method help the private helper method read it is still called what changes if we want to add a new label like verb the program will still associate the name of the file of replacement values in verb.text with the new label we could store a new string like quote verb quote in the local string array of variable labels we could add that just after the string timeframe for example unfortunately we still have the limitation in that the code uses a naming convention for files like verb.text for the label verb we could use a hash map in a different way to associate file names with labels without modifying the program the program could be designed to read a file of information that specifies where to find the words to replace the labels rather than requiring the code to be modified compiled, tested and run to simply find nouns in a different file or website this kind of file is often called a dot properties or property file as shown here it simply associates a label with a source of replacements for that label hello let's summarize what you've had the opportunity to learn studying our gladlib example of creative storytelling the gladlib.java program read templates from files or URLs to create engaging stories about any topic you chose we use the program to motivate the need for and study of the ArrayList class ArrayLists are like arrays in that they support indexing to individual elements but an ArrayList object can grow as needed rather than being a fixed size we also use the program to motivate the study of the hash map class hash maps are very efficient structures that can be used to associate keys with values you saw two examples of this in the study of the gladlib class we also use the gladlib class as a small case study to understand that creating code and programs that are extensible is a good idea but requires thinking planning and experience we use the gladlib class to study the ArrayList class an ArrayList object is an indexable collection of elements ArrayLists store primitive types so you can store integer objects but not into values this means you often need to update ArrayList integer values in two steps first getting the value and updating it and then putting that value back in the map to use the ArrayList class you must import it from the java.util package in contrast you don't need to specify a package for arrays useful methods you've seen include add to add a new element to the end of the ArrayList size to determine the number of elements in an ArrayList by its index set to update an element using its index an index of to determine where an element is stored in an ArrayList by index you can write code to loop over all elements in an ArrayList by either using an ArrayList object as an iterable or using an int for loop starting at zero and looping up to but not including the size of the array to access each element by its index we also studied the HashMap class a HashMap object is a collection of key value pairs the keys serve as mappings for accessing the values hence the name HashMap both keys and values are objects so you would use integer rather than int just as you did for the ArrayList keys are best as immutable objects like string or integer keys must be unique values can be any object type including string or ArrayList as in the examples you saw you import HashMap from java.util just as you do for the ArrayList class in the examples you saw the methods put to add a key value pair to the map size to determine the number of pairs or keys in a map get to access a value by its key key set to iterate over all the elements and contains key to determine if a key is in the map looping over all elements requires an iterable over the key set of the map you can't access individual elements one at a time by an index as you can with the ArrayList two different collections with two different sets of strings we hope you'll have fun using them now you're going to write some programs to analyze web server log files most major web servers log each access to a file which records who made the request when the request was made what the request was and how the server responded so why would you want to analyze web server logs? a web server's log file lets you understand a lot about how your website is being used you might want to know how many people are visiting your site is it popular or not if you have many different pages including traffic or only a few understanding how your site is being used is particularly important if you are trying to make money off of it popular pages bring in revenue while pages that nobody looks at are not helping your business the log file can also be useful in diagnosing problems as it will tell you when your server is experiencing errors if one of the pages isn't getting traffic because a link to it is broken you want to know so you can fix it for the rest of this lesson you're going to work on code to read the contents of a log file being able to read the contents of a log file will set you up to solve a variety of problems such as figuring out how many different visitors have come to a website or how many times each visitor has visited the site welcome back now that you know about the importance of web server logs it's time for you to start thinking about writing code that deals with them the first thing you're going to want to do is to be able to read in a web server log file and represent the information in it in java objects to do this you need to think about two things the first is what does all of the information in the web server log mean and the second is how do you represent it in a java class here you can see one entry from a web server log file it has a lot of information and it's not readily apparent what each piece means you would want to read documentation about this web server this particular data came from the web server log for an apache 2.4 web server so to find out more you could go to google and search for apache 2.4 web server log file format if you do that you'll end up with a lot of hits and this first one here gives you a link to the apache documentation site if you scroll down a bit you'll find the information on the access log that is the log of accesses to the web server which is what this web log is and you'll see that it has information about each of the pieces of this entry the first is the IP address that is the address of the device on the internet which made the web request that is logged here the next two pieces are both dashes that indicates that they're missing information the first dash is for some information about who made this request which you'll see the documentation says is unreliable the user's computer could lie about who they are the second dash is for the username if they're logged in with HTTP authentication if they've typed a username and password on the website the next piece of information is the date and time when the request was made next you have the request itself including what type of request it was in this case get where they're asking for a particular web page and then what page they asked for next is the status here it says 200 which indicates success there are many other statuses which indicates success or failures you may be familiar with 404 which is the very well known status code that indicates that the requested page was not found finally is the number of bytes that the server replied with how much data it sent back to fulfill this request okay now that you've read the documentation and understand what each of these pieces of information mean it's time to think about how to represent them in a Java class the first thing you need to think about is what type of information each of these is for the IP address you could use a string since you're interested in just the text of that field Java does have a built in class for IP addresses which will give us some more features if we wanted to actually connect to that address for example but we don't need that functionality and we don't need to worry about the complexity that would introduce right now we don't need to represent the two fields that we've omitted that have no useful information we do however want to represent the date you could use a string for that in which case you would just have its text or you could use the built in Java date class which understands what dates and times are and how they relate to each other so you could check if one time is before or after another time for the request you can just use a string and for the status and number of bytes those are both numbers so you can use an end now that we've thought through these types it's time to turn this into some Java code you can see the start of a Java class for a log entry we've declared a public class log entry and written fields based on the types that we just discussed you should now think should each of these fields be public or private remember that if a field is public any piece of code can access it and if a field is private only code within this particular class can access it in this particular case it makes sense to have each of these be private and to design your class to be immutable remember from earlier when we learned about strings that immutable means you cannot modify an object once you create it so you're going to write this class so that each of these fields will be set in its constructor but can only be read to make anything able to read these fields you'll write a public getter or accessor method such as these which will just return the value of that field but there will be no way to set the value of the field once it's constructed speaking of constructing you need to write a constructor for this class there are two ways you could do it the first is to take in this entire string and then I have the constructor pull it apart into each of these individual pieces and fill in the fields or instance variables of the class the other is to have the constructor take each piece of information separately and simply initialize the fields of the object we're going to have you do the second one just making a constructor that looks like this which is going to fill in the fields based on the information passed in with each piece being passed separately why would you do it this way well this gives you a little more flexibility if you wanted to create one of these objects with other sources of information you could do so it turns out that pulling this line apart is actually a little bit tricky so we're going to give you the code for that it's a little bit ugly and we'll package it up into a nice method for you so you can just use it to read the file here is the final log entry class with all of the things you just learned about you would use this to represent one of these log entries as you work with it it's going to be to use this class the code we give you to split this up into each separate piece and put them together to make code that's going to read the entire log file so today what we're going to do is we have this log entry class and I want to show you two string so the log entry class notice it has these five fields here IP address and so on we've got our constructor and we've got all these methods written two string down here and you can see that it returns those five fields as one big long string and what I've done is I've got a tester over here that is creating two log entries LE and LE2 I've just kind of made up some information for each of them so they're different and then it prints out the actual object it prints out LE and it prints out LE2 so let's run it and see what happens so I'm going to come over here notice I need to compile both of these I'm going to compile tester and it actually compiles both of them for me I'm going to now run tester test log entry and there you can see it printed out the five pieces of information for each log entry, very nice so now what I'm going to do is I'm going to come back to my class and instead of calling it two string I'm going to change the name I'm going to call it get log info and compile again and run it and let's see what happens so I've got my object and now I'm running test log entry oh so I ran it but it just printed out something else let's see what it printed it actually printed out the memory location of each of those objects so let's go back to test our tester here and you can see when I print out an object it doesn't know how to print it out so it just prints out it's address location it's not calling the method we wrote which is called let's see here right here we called it get log info it's not calling that because we didn't specify it so let's come back over here and we'll specify to print out a log entry with that method we need to actually write log entry dot get log info and I'll just do that for the first object and I'll leave the other one le2 like that and let's see what happens we'll compile it and we'll come over here and run tester and so you can see what here for the first object I called get log info and it prints calls get log info and I'll specify pieces of information but for the second I just said print the object and so it just shows the memory location of it so what's going on here it turns out that every class has a two string method by default but it only knows to print out the memory address of an object unless you actually specify a two string class so I'm going to change this name back to tester and get rid of this because notice I didn't actually call two string here I just said print out the object and again if I show you what happens here you don't need the there we go so we're just going to print out the object we're not saying how or anything but it just knows go look in my class and if there's a two string method that's how I'm specifying how I want to print it out so again we call it here I didn't call two string it knows it says oh you've got a two string method so I'll use yours and it prints out the five pieces of information the way I specified I wanted the object printed out I want to show you one more thing I'm going to come over here that name two string is very important it has to be spelled exactly two string with a capital S because I'm going to show you if you change it to lowercase S so it says two string then when I come over here and I run it let's see what happens it goes back and writes the memory address because it says if there's a two string in there spelled with a capital S it's going to use that but that's the only thing it looks for since I don't have it spelled correctly it doesn't find it and so it says I'll just print out the memory address location so just remember that all objects have a default two string method it's just going to print out the object address then you can write your own and so when it runs it'll see if there is a two string method in there then it will call it that's it, that's for two string, thanks Hi, now that you've made your log entry class you need to parse the lines of the web server log to be able to create instances of the log entry class you'll do this by splitting the string into the appropriate fields to pass values to the constructor for the log entry class you can parse this task with many index of and substring calls although this task is not algorithmically hard the code for it's very cumbersome for example for the time portion of the entry you would need to turn the string into a date object the built-in Java class from the java.util package which represents a date and time even though both the date and time class as well as methods which parse strings are part of Java the use of the date class is complex especially since the date format in the server logs is not the default format in Java for these reasons we've provided code for you which will take a string from the web server logs parse it into appropriate fields and return a log entry record to use this call weblogParser.ParseEntry and pass the string you want to parse the method returns a log entry object with that in mind it's time to turn your attention to starting to write the log analyzer class for right now you're going to write code in the constructor to initialize the object and then write the read file method in later lessons you'll write additional methods that will perform the actual analysis of the log file that you've read in the first thing you would do to fill in the code for the constructor the constructor should initialize the record fields to an empty array list you've created array lists in the past so what you need to accomplish this task should be familiar the second thing you should do is fill in code for the read file method this method will determine the file name to read from and then add log entries to the records field to reflect the information from the file you opened to accomplish this task you'll want to make a file resource for the requested file you will then want to iterate over the file resources lines and for each line you will use the WeblogParser.ParseEntry method to convert the line of text into a log entry then you'll add that log entry to the records field which as you may recall is an array list when you've written the constructor and the read file method you'll want to test out your code we've provided a convenient method called print all log entries you've stored in the instance variable records remember the two string method that we taught about system.out.printlin will make use of that two string method to represent the log entry as a string once all this works it will be time to start analyzing the data you've read in happy coding in this lesson you learned a bit about web server logs which can give you a lot of information about your website you learned what is in them as the records are formatted as we looked at the Apache web server documentation together and then you learned how to make a log entry class based on the information in that file as part of that you learned about two string which is an important Java concept as you will want to write a two string method in many of the classes you create finally you wrote code to read in the log file using the parsing code that we provided now you are ready to write some code to analyze the data you've read in welcome back you have already written code to read a web server log file parsing each line into a log entry object and creating an array list of them now it is time for you to write some code to analyze the data you have read the first problem you're going to solve is finding out how many different people visited a website you don't want to just look at how many elements there are in your array list since some people may have visited your website multiple times so you need some way to distinguish requests from different places you can use the IP addresses recorded in the log file to tell where the request came from using the IP address is not perfect since you cannot distinguish between different people using the same computer but how many different IP addresses you see is a very good estimator for how many different people visited the site as you might recall from programming and the web for beginners an IP address is the address of a device on the internet whether that device is a traditional computer a mobile phone or something else so what you are going to need to do to solve this problem is take the array list that you have read the log entries into and find out how many distinct IP addresses are in it have fun! you have already written code to read in the entire contents of a web server log and now you want to find out how many unique IP addresses it contains as always you are going to want to approach this problem with the 7 steps you use for every problem we are going to walk through this problem starting at step 1 but we are going to use color names instead of IP addresses this problem is really the same problem there are unique values there are in an array list of strings but color names are easier to say and to see which ones we are talking about so when we look at this list of 10 values you might be able to tell just by looking at it that there are only 4 unique values but if this list had a million elements in it we would like to develop a method that would still work there are many ways of solving problems in general and in this case there are a couple that I will mention and I will summarize in code one method that we won't do is to as we look down the list cross out values that we have seen before crossing out values in a list is a problem in Java programming because if we replace the values in our parameter with other values we have caused a side effect something we have spoken about before as a thing to avoid if you can so instead we will use an idea of visiting every value in turn which is typical for array problems and if we haven't seen a value before we will copy it into a new list so first I am copying the value pink because I haven't seen it before in my new list then I am going to visit green I haven't seen it before so I will copy it over into my new list now I am going to visit pink I have already seen it so I don't copy it then green I have seen it so I don't copy it pink pink I have seen them all orange I haven't seen before so I will copy that into my new list the next value is blue I haven't seen so I am copying it into my list finally I get to pink the last value in my original list I have seen it before so I don't copy it over now I look at my list that I have created it has four values and so I am going to return the value four from my method that I am writing that determines the number of unique values in the array parameter developing this algorithm follows a lot of the same patterns you have seen before you will see that sometimes you add an element to the copy and sometimes you do not and you will want to think through the conditions under which you add once you have done that you can express the main portion of the algorithm in terms of steps to do for each element of the input you can work through steps two and three hopefully you came up with pseudocode that looks like this now for this particular problem you will want to do things just a little bit differently remember that you have log entries in the field records and you want to use the get IP address method to get the string out of the log entry object there are a couple of ways to deal with this difference the simplest is to just use the same algorithm just slightly to reflect the fact that you want to use records the fields in the class and you want to get the IP address out of each element of records check to see if that IP address is in copy and if not add the IP address to copy at this point you would want to test out your pseudocode then you are ready to turn it into java code alright now you have devised the algorithm to count the unique IP addresses in a web server log so it is time to turn it into code as usual we have here the outline of this method with the pseudocode that you have just developed the first thing we want to do is make unique IPs which starts as an empty list so we are going to have an array list of strings and we are going to call it unique IPs and it is going to be a new empty array list now we want to do something for each element we are going to call le in our records so as you are familiar with by now this is a for each style for loop each of these is going to be a log entry and remember that records even though we don't have a records variable in here is an instance variable in our class so we are going to take each log entry in records and then get the IP address out of it and then we want to know if IP address is or is not in unique IPs so we are going to say if unique IPs .contains IP address but we want not that that is the opposite so we are going to put a not in front then we want to add IP address to unique IPs unique IPs .and IP address close that close that and then at the end here it says we want to return unique IPs .size now my braces have ended up in slightly weird places that is probably just because I have some braces in my comments so I am going to delete those and then try to make sure my code lines up nicely so now I am going to hit compile it says class compiled, no syntax errors of course we want to test this I have already written a tester here which is going to create a new log analyzer read in short test log which is this log file here it has this IP address this same IP address appears three times we have only seen two unique IP addresses we have a third and we have a fourth then it uses the log analyzers count unique IPs to count the unique IPs like we just did and then it prints out how many there are so I am going to go over here to blue j I am going to um make a new unique IP tester and I am going to run test unique IPs and it prints out our IPs which is the result that we expected so we are more confident that our code is correct now that you have written code to find the unique IP addresses in a web server log let us take a brief look at something very close that would not have worked here is the code that you just wrote for the unique IP addresses problem remember that you have an array list of strings and that you put the string for each log entry's IP address into that array list what if you had written this code instead this code is the same as what you wrote but the array list holds log entries and the code checks if it has the current log entry not the log entries IP address likewise it adds the entire log entry object if it was not already there if you were to run this code it would give you the wrong answer in fact it would tell you how many total log entries there are not how many unique IP addresses there are in the log file why is that to understand this problem think for a moment about how contains would work in particular how does contains know if two objects are the same or different contains is going to check if they are equal what exactly do we mean by equal java has two different notions of equality to illustrate this consider the situation in which you have three string variables A B and C which refer to two different string objects A and B refer to the exact same string object so they are clearly equal A and C however refer to different string objects with the same logical contents on the one hand you could say they are equal because they talk about strings that mean the same thing on the other you could say they are not because they are talking about different objects these are the two different notions of equality that exist in java the notion of equality meaning the exact same object is what you get when you write equals equals if you write A equals equals B then java checks if A and B refer to the exact same object since they do this expression evaluates to true however if you write A equals equals C then java again checks if A and C refer to the exact same object but because they do not this expression evaluates to false the other notion of equality do they mean the same thing is done with the dot equals method if you wrote A dot equals C then java would call the dot equals method in the string class which checks if the two strings have the same sequence of characters because these two strings have the same sequence of characters A dot equals C would evaluate to true so how does java know whether two objects have the same logical meaning each class defines dot equals to specify what it means for objects of that type to be the same if you do not write one explicitly the default behavior will be to have the dot equals method check if the two objects are equals equals to each other that is if they are the exact same object so now that you know about equals equals and dot equals should you write a dot equals method for log entry well the first thing you should do is think about when two log entries are logically the same what about if they have the same IP address well that would fix the broken code for this particular problem it's not a good approach in general it does not actually match with the notion of two log entries meaning the same thing two different requests are not the same even if they came from the same computer so what if you checked more information what if you checked for the same IP address and the same request string even this would not really mean the two log entries are the same as the same computer could ask for the same page many times for log entries it makes sense to just say that they are logically the same only if they are in fact the exact same object because the behavior you want is the default for dot equals you do not need to explicitly write it since you do not need to write a dot equals method for this class we're not going to delve into how to do it yet fully understanding what goes into a dot equals method requires a little bit of knowledge but you will not get until the principle of the software design course however now that you understand the ideas of equality and that the contains method checks if two objects are the same you can understand why this code did not work and why this code where you use the IP addresses did work thanks great now you have completed your first analysis of the web server log file data that you read you wrote code to figure out how many unique IP addresses there were which gives you a good idea of how many people visited your site doing this gave you some great practice with array lists which are useful for solving a wide variety of problems and you also learned about Java's two different kinds of equality equals equals which tests if its operands refer to exactly the same object and the dot equals method which checks if two objects have the same logical meaning each class can define this method however it needs to so that it can provide the appropriate definition of having the same logical meaning now it's time for you to solve a new problem with web server logs for this problem you'll be writing a program to determine how many times each user visited their website this information can be quite useful for understanding how people use your website do the same people visit your site repeatedly for example this person keeps coming back to the site again and again her IP appears many times in the log file seeing the same visitor repeatedly coming to your site might indicate that they find the website useful because they keep coming back by contrast this person visited the site once but never came back maybe it wasn't what he was looking for or maybe he found the site hard to use if most users only visit your site once it might suggest that there are some issues you would want to look into of course as with all problems the techniques you're going to learn are useful in a wide variety of other problems not just those dealing with web server log analysis so let's dive right in as always you're going to apply the seven steps to solve this problem as with the previous problem we'll observe that the problem is fundamentally the same no matter what strings you use and use strings that are easier to talk about than IP addresses we'll work through this using some animals we want to count how many times each animal name appears in this list cat snake t-rex snake cat now you start by working through this example in a step-by-step fashion examining each animal and keeping count of how many times you have seen its name when you have finished looking at each animal in the list you have your answer cat and snake both appear twice and t-rex appears once now that you have worked an example it is time to think about and write down exactly what you just did the first thing was to make an empty table where you could keep track of each name and how many times you had seen it saying you made a table is fine but it is also good to think about what this means in terms of actual data types you can use in your program what type have you seen that is useful for representing this kind of information yes a hash map that maps strings names to integers counts once you realize that this is a hash map you may as well call the columns by their technical names key and value finally you would want to give this hash map a name so that you can refer to it easily we will call it counts next you looked at the first string in the list cat then you looked for cat in counts and saw it wasn't there so you put it in with a value of one you did a similar step for the second string snake which was also not in counts and for the third string t-rex for the fourth string snake things were a little different when you look in counts you see snake is already there with a count of one so you update the count to be two and similarly for the last string cat you find that it already has a count of one so you update it to have a count of two after all of that the entire hash map counts is your answer that leads to these steps for this particular instance of the problem so now it is time to find patterns and generalize to any instance of the problem notice that there are several steps that did not find the current string in the hash map in each of these cases you put it into the hash map with a value of one why one? do you always want one? or should you look for some other pattern? if you think about it for a moment you will realize that here you always want one it is the first occurrence of that name so you have seen it once there are also some cases where you already had that particular name in the hash map in these cases you want the old value to be two again you should ask yourself why did I use two? do I always want two or is there some other pattern? in this case you do not always want two instead you want the old value plus one that just happens to always be two here because of the specific example you worked with all of that in mind you should be able to generalize these steps and come up with an algorithm that looks like this hash map then iterates over each string in the input and checks if that string is already in the hash map if not the algorithm puts that string into the hash map with a value of one and if it is already there it updates the value to be one more than the old value after processing all the strings the answer is the hash map counts now you want to test this algorithm out test it on the input fish dog fish fish the algorithm got you the right answer so you can be more confident that it is correct before you turn this into code we are going to remember that our input is not actually a list of strings but a list of log entries whose IP addresses you want to process this slightly adjusted algorithm is basically the same but we have changed this variable name to le to stand for log entry and are iterating over the contents of the array list records in the log analyzer class then you want to get the IP address out of that log entry and use that as the string that you use to update the hash map now it is time to turn this into code we'd like to find out how many times somebody visits a website so we are going to look at a log file that has IP addresses and see for example for one IP address how many times did that IP address appear in the file that's how many times that person visited the website so we have a program here a class called log analyzer and we are going to write the method count visits per IP so how many times does each IP address visit a page and what we have in here is we have we're going to put log entries in an array list it's called records so we have a constructor that initializes that array list we then have read file which is going to have the file name of the log entries and that is going to allow you to select a log entry file and then it just goes through and reads through all the lines and puts them into records so we're going to focus on writing count visits per IP and we're going to use a hash map to do that so the first thing we're going to do is we need to make an empty hash map and we're going to be mapping a string to an integer so for each IP address which is a string we're going to map that to the account which is the number of times that IP address appears in the file so first let's make let's see a hash map of type string which is the type of the key and of type integer which is the type of the value for the hash map we need to give it a name so we're going to call it counts and then we have to create a new one so a new hash map of type string for the key and integer for the value there we go now that we have it we want to iterate over all the records that we have so we're going to use a for a for loop for each log entry we need a variable name so I'm just going to call that le we're going to iterate over the records where we put all the records from the file there's our for loop and now we're going to look at each log entry one at a time so first what we'll do from the log entry is we'll get the IP address okay so we need a variable for that and that is going to be of type string so we'll call that variable say IP which is of type string and we will use the log entry to get the IP address so if you remember we have get IP address and that should get us the IP address now that we have the IP address we have to see if it's already in our hash map or not so we'll have to ask a question about that so we're going to ask if that contains key okay so again counts is our hash map and we're using the contains key function and we will ask if that IP address is in there or not and I'm actually going to ask first if it's not in there so we'll put the not there and if it's not in there then that means we want to put it in there for the first time when we put it in there for the first time its count is going to be one so we'll add code for that so we get the name of the hash map which is counts and we'll use the put and we need to put in the IP address which is the variable IP and we put in one for the count one now if it's not in there then we'll want to sorry if it is in there then we'll know it's already in there we need to get the value out we need to add one to it and we need to put it back in so we'll do that now so essentially we're going to use counts.put but what we're going to put in there is we're going to put the IP address in there again or basically replace it with we have to get the old value out and add one to it so we'll have to use counts.get to get the old value of it and then we'll have to add one to that and then what else we have to do we come down here so if we get to the end of that and we've looked at all of that we put all of our log entries into our hash map and essentially they're each in there once with the count of how many times they appear in the file then we can return the answer which is just counts so we'll compile this and let's see we've got to get IP address correctly so we'll fix that and we've compiled it with no syntax errors now alright so now we want to test this out again let's just see count visits per IP is going to return the hash map that is all the mappings of the IP addresses to their counts so you get the whole thing we need to test it out now so in order to test it out we're going to create another class here which I've started called count tester and what we'll do in here is we'll first create a log analyzer the class that we just had so we'll create a log analyzer object we'll just call it LA and we have to create a new one okay so now we have a log analyzer object we'll need to pick a file to read from so I'm going to use the read file and we'll put in I happen to have a very small test file to make sure and convince myself that this actually works so it is called short test log and I'll show you that in a minute and then I want to now call count visits per IP and remember that's going to return a hash map so I need to have a hash map variable to put it in so I'll do that hash map of type string integer I'm going to call it counts and now that is going to be assigned the value returned by our log analyzer which is called LA and that's going to call the method we just wrote which is count visits per IP and then once we have that we can just print out the hash map that we created so I'll just have system.out.printline on counts and let's see if this compiles and we're missing something let's see we need to use we use the wrong kind of quotes so we'll put double quotes here and we'll try compiling it again and we forgot the colon at the end all right good we're good so that works and I want to show you over here my simple file and then we'll run it so I have this very short test file and you can see it's got one two three four seven log entries in there you can see the second entry which is 152.3.135.44 is in there three times now we're going to run it so we'll come over to blue j everything's compiled we'll run count tester we got to create the new object first and we'll run and there we go and you can see if I can put both of these up here at the same time there we go with one so we created a hash map and you can see that the one that starts with 152.3 is listed as being there three times that is we print out the hash map and it prints out each IP address equals its count that we came up with and so the second entry here is 152.3.135.44 with a three and you can see the 157.551 is only in there one time and we got one and you can see the 152.3.135 is in there twice and you can see we counted it twice so now I'm more convinced that what we wrote is correct alright happy coding think back to the problem where you found how many unique IP addresses there were in a web server log that problem was essentially finding out how many different strings there were if you look at what you just did finding how many times each string occurred you may realize that you have solved a larger problem of counting unique strings is already here you have each unique string in this hash map as a key and would just need a way to turn it into the answer you want your counting algorithm has already done the hard work you would just need to be able to extract the answer from the hash map this situation is common in programming you may write code to solve a more complex problem and then be able to solve a simpler problem easily by using the more complex algorithm to do the hard work recognizing such situations can be quite helpful to becoming a highly productive programmer in this case using the hash map from the second problem to solve the first problem is easy hash maps have a dot size method which tells how many key value pairs the hash map has as each key appears once in the hash map the result from dot size tells you exactly how many unique keys there were in the input if you had written count visits per IP first you could have written count unique IPs with just these two lines of code the first line uses count visits per IP to solve a larger problem and the second line uses the dot size method in hash map to turn the answer from that problem into the answer to this problem the size of the hash map is exactly the number of unique keys which is the answer to this problem whenever you are programming try to think of ways to use code you have already written and tested thank you congratulations you have solved the problem of counting how many times each user has visited a site you solved this problem using hash maps which you learned about previously and should be becoming quite proficient with by now that proficiency is great as hash maps are really useful data structure to solve a lot of different problems you also saw how you could use the solution to this problem to solve a different problem easily you could just take the size of the resulting hash map to figure out how many unique IP addresses there are in the log file being able to realize when the solution to one problem can be used to solve another is a very useful skill to work on to develop as a programmer if you think back to the start of this course you learned a little bit about cryptography and implemented the Caesar cipher now you are going to learn a bit about the vision air cipher which historically is quite important as it was thought to be unbreakable for hundreds of years however as you are going to see and do the cipher is quite easy to break with a computer now let's see how this cipher works the key in vision air was classically a word for example here we picked dice as our key you write down the word repeatedly to match the message length each letter represents a number for how much to shift by so dice means shift by 3, 8, 2 and 4 repeatedly well in your cipher program it would be quite convenient to represent the key as an array of ins now to encrypt you shift each letter by the amount written under it much like you did in a Caesar cipher but each letter gets shifted by a different amount the first letter is m which has 3 added to it so you get p the second letter is e which has 8 added to it so you get m then you repeat this process with a higher message as we did for Caesar we'll have to skip anything that's not a letter notice that conceptually this cipher is like 4 different Caesar ciphers one with a shift of 3 shown in blue one with a shift of 8 shown in red another with a shift of 2 shown in green and a fourth with a shift of 4 shown in purple a programmer who has already written an implementation of Caesar cipher could make use of it to help implement a visioner cipher in fact you could make an array of Caesar cipher objects one with each shift specified in the key and use them for your encryption if you did something like this you could use the mod operator to wrap account into the pattern 0123 0123 in any project we're going to give you the code for a visioner cipher and you are going to write the code to break it your goal is to take messages that we have encrypted with visioner and find the decrypted message without knowing the key we used you will start with breaking a message that you know is in English and then expand your program so that you can try to break encryption in a variety of languages why? the first step of your visioner breaker is going to be to write code which only handles the case when you know the key length and are working with a single language like English whenever you're developing software it's a great idea to implement one feature first to test it thoroughly before you build more features into your program and that's exactly what we're going to be doing here remember that the visioner cipher acts like several Caesar ciphers with a message here you see a message which was encrypted with a visioner cipher whose key length is 4 we've colored the letters based on the part of the key that was used if you take only the blue letters you could use the Caesar cipher cracker that you wrote previously to find that part of the key these letters are basically just a message encrypted with a Caesar cipher but they're spread out through the total message then you could do the same thing and similarly for the green and the purple letters this is the conceptual idea of how to break a visioner if you know the key length you slice the string up and you break each slice using a Caesar cipher so what should you write to implement this the first method you'll want to write is slice string which takes three parameters the message to slice up which slice you want and the total number of slices for example if you call this method with four total slices and which slice equals zero you would get the blue letters put together into a string like this similarly which slice equals one would get the red letters as you see here the new line character counts as a string and which slice equals two you'll get the green letters in the string and finally which slice equals three would give you the purple letters again you see the new line character we have some advice for you to help you write slice string first remember the string builder class you learned about earlier it will be useful as you build up the resulting string to return you'll want to append characters to a string builder object second you will likely want to make use of counting four loops in ways that are slightly different than you've seen before you've typically had four loops start at zero but they can start at any number here you see an example of a counting four loop that starts at four of course you could use a variable or a parameter to indicate where you start counting that will prove quite useful in this method you can also have counting four loops that count by something other than one this loop counts by seven so it would print the values four, eleven, eighteen and twenty-five of course you can count by a variable or a parameter instead of the constant seven you see here that may be very useful as you write slice string once you've written slice string you'll want to write the method try key length this method finds the visionary key for an encrypted message assuming that the key length is k length the parameter it also takes a parameter for the most common character in the language this parameters for later pass it e when you write this method you want to make use of slice string that we just discussed and you'll want to use the Caesar cracker class we've provided you with a version of Caesar cracker that's similar to the one you wrote before but with a few changes first we separated out the code that finds the key from the code that decrypts the message and second we've made a constructor which takes the most letter in the language that you're working with in this part of your program you'll just pass the most common parameter this method should return an interay of length key length which holds each of the shifts that the Caesar cracker found for each slice of the message after you've written try key length you have one more method to write for this part of your program break visionary this is the method you'll want to call blue j it will set everything up and will call the method try key length you'll want to use a file resource object to read in the file that you'll want to decrypt file resource has a useful useful method as string which reads the entire file into a string for you once you've read the entire file you'll want to call try key length passing the message you just read the key length which is given to you at this stage and the letter e which is the most common letter try key length will return the key as an array of ints you'll simply pass this to the constructor for visionary cipher and you'll make use of its dot decrypt method to decrypt the encrypted message finally you'll print out the result voila you are done now you've written the code to break visionary if you know the key length but what do you do if you don't know the key length well you could just try some different key lengths you could use the code you just wrote and pass it a key length then see what the result is maybe the key length is one hmm that output is incomprehensible one is probably not the correct key length what if you try to hmm that doesn't seem to be right either how about three oh hey this output looks like it could be English it has readable words it would seem this message was encrypted with a key length of three you could write a loop to try many different key lengths start at one and count up from there the computer can try one particular key length in a fraction of a second so even if a message which is thousands of lines long is encrypted with a key length of 92 the program can do it in no time but how do you tell if the key length is right do you really want to look at the output for each iteration to say that's what we did a moment ago looked at the output to see if it was meaningful text but maybe if we think carefully about what we just did we can find a way to automate it as you look at the incorrect decryption here think about how you knew it was not right this group of three letters forms a word but j o w is not a real word likewise y t isn't a real word nor is y o b none of the words in this message are actual English words contrast that with the correct decryption h o w is the word how d o is do and y o u is u all of the words in the correct decryption are actual words this observation leads to the idea of how to figure out which key length is right count the number of real words in the output you would start by reading a list of English words from a file you could store the list of words in an array list but we are going to recommend that you use a hash set which will give you the same functionality but work much faster then you try the decryption for various key lengths start at one and count up where do you end well you could just count up to the length although if the key is close to the length of the message you won't be able to break it because there won't be enough letters to meaningfully frequency count for this mini project just count up to 100 then for each key length you try see how many of the words in the decrypted text are actual words from the file you read in once you have done that choose the key length key and decrypted text which give you real words as we just mentioned using an array list would work just fine as you read in the file you would use the dot add method to put each word in the list when you want to see if a potential word is actually a real word from the list you would use the dot contains method a better option is to use the hash set class like array list and hash map you can use a hash set containing different types of data so you need to put string in angle brackets after hash set you want a hash set of strings for this problem you will use the hash set in much the same way as you would in array list you can call dot add and dot contains on it you cannot index into a hash set like you can in array list but if you wanted to iterate over one you could do that with a for each loop that is what has been happening behind the scenes when you iterate over a hash maps dot key set the main advantage is that dot contains will be much faster instead of searching through every word in the hash set it can look at only a few words to figure out if it contains the requested information or not the last thing that we will mention before you start coding is how to split a string up into individual words you have already seen that string has a dot split method you can use that passing in quote backslash capital W quote this asks the split method to divide the string up at every character that is not part of a word spaces punctuation or numbers once you have done that you can iterate over those words with a for each loop like this one okay great now you have cracked vision air for English messages of unknown key length but what if the messages were in other languages what if you did not know the language well you can use the same techniques and just try each language for each potential language you would need a list of words in that language and to know the most common character which is not E for all languages then you can try breaking the cipher for each language you would use the same code you already wrote to find the key length that gives you the most real words in each language this gives you the best choice for that particular language then you just see which language results in the most real words that is which languages best decryption is best overall to do this you're going to read the words in for each language you can make use of the read dictionary method that you already wrote as you read each dictionary you will want to put it into a hash map whose keys are strings and whose values are hash sets of strings in particular the key is the name of the language and the value is the dictionary that you read in with read dictionary such a hash map would conceptually look like this here here you have a table where the keys are the names of languages the values are then the sets of words in each language you might think that this type looks a little complex you have multiple sets of angle brackets nested together however it is a great example of an important programming principle do you remember the idea of composition if you took our course programming in the web for beginners you learned about it in the context of html the idea that programming languages allow you to put small pieces together to make larger pieces and that they obey the same rules when you put them together this principle lets you make and understand large and complex things in this case you can understand this complicated looking type by understanding the pieces so what exactly do you need to make your visionary breaker work for unknown languages you need to write two new methods one that counts the most common character in hash set of strings and one that tries the different languages you also need to modify two old methods the break visionary method which is what you run from blue j and the break for language method you need to write the most common car in method to account for the fact that e is not the most common letter in all languages so you will count the frequency of each letter in the hash set of strings which is the list of words for that language you have seen and done many problems with counting how often you find something and figuring out which of something occurs most often so hopefully you are getting proficient in these skills by now the other new method will try each language in the key set of the hash map language you will want to use break for language to do the work of trying to break that one particular language you will want to use dot get to get the word list out of the hash map to pass into break for language you will then want to see how many words you ended up with in the string that break for language returns fortunately you already wrote a method that does that you will then want to pick the best language the one with the most words and its decryption print them out so that you know what your program found you will want to make a few changes to break vision error the method that you call from blue j instead of just reading one languages word list you'll want to read many of them the other changes that you will want this method to call break for all langs instead of break for language so that it tries all of the language that you all of the languages that you read in you will also want to make one small change to break for language up to now you have just passed in e is the most common letter however now you will want to use the most common car in method that you just wrote to find the most common letter in the word list and pass that into try key length so now that you know the plan it's time for you to devise your algorithms and write your code