 and what is database normalization and why do you need it and when should you use it? Okay, so in a nutshell, normalization is really just a series of steps that you take or rules that you follow to make sure that a large database or a large table is broken down to very small tables so that those tables will not have any problems when you insert, delete or update data. Okay, we call these errors called anomalies, right? So when you have an insert anomaly, that means when you insert some data into the table, it causes some errors and when you delete a record for the table, it also causes some error as well. And same thing with the deletion. So when you have those kind of issues, we call that table is not normalized or is not normal. Okay, so if you think about a normal table, is a table that has all the data and those data should be only about that table. If any of the data in the table is not about the table, means we'd say that table is not normal, right? It's not normalized, is correct? I guess technical term for that. Okay, so it's a well-structured relation. The term relation here is just another name for table, okay? So when we hear the term relation, it is a table. So here I just saying that a table should satisfy at least the following. And I mentioned this before already that the table should describe one, it only one entity. Okay, if it does not describe that single entity, then that table is not normalized, okay? And then all the rows must be unique, right? So every table should have a primary key, that's the general basic rules. And then the order of these columns or the rows are not important, right? You can order in any way you want and the database structure, it doesn't really matter. Okay, so these three are the three key points that all relational database should satisfy or should have, okay? So let's go down here. Again, we talked about this briefly last time. So when a database is used as a live transaction database system, like the business processes here, these are usually needed to be normalized, okay? And the OLAP type itself for data mining contains a lot of old historical data or making predictions and things like that. You don't have to normalize these. So usually these are not normalized data and they are more useful in that way, okay? But if you want to do processing, live processing your data should be normalized, okay? So that is another reason why, when you should use normalize database. So again, let's skip that, go down here. You already look at this last time already, the different tech terminology. So make sure you know what they are and get used to what they mean. And right here, I just mentioned that above, the three reasons why a database should be normalized is to prevent these anomalies, okay? So the three types, delete, update, and insert mainly. And when you design a database, we talk about this, either approach will be fine. And then we talk about relationships already, right? We did that a few weeks back. I'm gonna skip that and go right down to this part here. Okay, so when you normalize a table, a database, it follows a certain form. We call these the normal forms and they're about six or eight, probably more than that and the normalization process. But typically in a, you know, regular usage or regular database systems is very, very highly unlikely that we're going to normalize a table all the way to the sixth normalization form here, okay? So usually up to the third or the fourth one here, sometimes these two are used interchangeably to mean the same, but a little bit slightly different. But usually we just normalize a table to the third normal form and sometimes just call it three NF, okay? Third normal form, second and first. So your table, all the tables that you create should conform to this rule, okay? Should be in the third normal form. If it's not in the third normal form, you're going to have problems when you insert the lead update data, all right? So I just focused on the first three. The textbook does actually a little bit more by pushing you to do one more step so that your table is actually in the BCNF form, which is between the third and the fourth normal form. So when you're in the third normal form, it should be good, okay? So, and on the right side, it gives you some descriptions for I guess some requirements that if your table is in the first normal form, this must be true, okay? I mean, the value store in the intersection, I mean in the cell, right? In each cell must store a scalar value. The scalar value is going to be a single value. It cannot have two or three values in that cell. Like for example, if a cell is called first name, then it should contain only one first name. It should not contain John and Jack and Joe, right? That's not normal. So if that's the case, you have to fix it. And if you had that, then usually that would satisfy the first normal form. This is not complete, okay? Not only that, you should also have a primary key, okay? So that's the first normal form. Once you pass that, then you can move on to the second normal form, the second normal form. So these are cumulative, okay? In order to satisfy the second normal form, to start from the second normal form, your table must have already met the first normal form. You can't just jump to two, okay? So if this is completed, now you can check the second normal form. That means every non-key column. So you have a primary key already in the first normal form. Now you check. Do all the columns in that table depend only on the primary key, okay? Depending on the primary key, I guess. If that's not the case, then you need to fix that and we'll do the process and go with the process. And then the third normal form here is kind of the same thing here. Actually, it's missing some things in here, but kind of like combining this together here, okay? That's why they put the third normal form is also sometimes referred to as the BCNF, right? The Boyce Cartier referred to the two people who actually developed this process way back in the 1970s, okay? So that means that your table must have a primary key. All the column in the table must depend only on this primary key. And then you do not have any other columns that depend on other columns in that table. Again, we'll look at some of these 10 call transitive dependencies and functional dependencies and so forth in here, okay? And we're not gonna do the fourth and fifth and sixth columns normal form. So these three are the most important one we'll look at in this course. So here is just a list of some of the benefits of normalization, okay? Again, just to prevent data from error having errors, it also makes your database much leaner as you don't have a lot of duplicate data in there. Imagine you have the same number of data inside that each row and all you have is just one single cell that is different, right? That is not cool, right? You have a lot of repeating data. So in the end, it consumes a lot of RAM, I mean a lot of memory space, so we don't want that. Not only that, we want to make sure that the data are actually very accurate and correct, right? So imagine if you enter some, I have a one person enter some data entry into a database table and they have my title as instructor, another one enter my information and they have as faculty and then another one enter my information and they call it staff, right? Which one is it? Am I a staff, a faculty, one instructor or what, right? So again, you have data integrity issues in this case. So you want to fix that so that every time we enter my record, I'm always a faculty or I'm always a staff. There's no other information that can add in there, right? So we avoid that and make sure everything is very accurate. And also when you have a normalized table, it's very fast. The index is when you search and there is much faster and quicker, okay? So that is mainly some of the reasons why we have normalization. And the question is like, when is a table considered normalized? When they ask you this question, the answer is it has to be in the third normal form, okay? It must be in the third normal form. If it's not in the third normal form, it is not normalized, okay? So that is the general rule across the board for all relational database systems. Okay, so that and then here is an example and this note that goes through the process of normalizing a table, okay? So everything blue here are the things that should occur, should be true when a table is in the first normal form, okay? So here's an example table that you might receive like an order team, okay? You have this information here. So this is quite on normalized table, okay? And a very common, I think you will see is like a receipt. Like you order something for Amazon or Walmart and they send it to you to send you the receipt and you look at the receipt, has the information there and all those data can be generated through a table, right? You did that during the first, a second week of assignments where we do some queries, right? I asked you to select data from two or three tables and you get a result set that comes from three or four tables. But data itself is a single virtual table, right? So this is exactly what's happening there. So from here, so if you have a table that design that looks like this in the table as one, two, three, four, five, six columns, okay? And now I ask you to say, hey, Jeff, here is another order. I want you to go into this table and enter the data. So Jeff will go in here, enter order number 104. The date will be today's date and custom number, maybe one of the same customers here. And instead of AMCO, you want to type AMCO company, right? So as you can see, it's a different name already, even though the customer number might be the same. So we have something wrong with the name because I might choose to enter, you know, CO with a dot as opposed to just CO or I might spell the whole word company like that, right? And then when you enter the street in here, you know Jeff might enter one, two, three, four and then you might spelled out the word first as opposed to one ST, right? And then the item orders over here could be something similar, okay? So you can see that it's a problem here already, right? I have a different name when you name the company name. This tree address is spelled differently. So we have some of these issues because it's not normalized, okay? So not only that, if you look at this table here, remember the rule, it says a table should be only about that object, okay? So whatever this table is called, maybe this is called the orders table, okay, order. Now are these information about and only about an order, right? So I guess you have to determine what is in order, okay? So that was what it means. I mean, so if you cannot conclude that these items are about order, then therefore it's not normalized. And so again, some example here, it goes through a process. So if you look at this table, what is the primary key here, right? Is there a primary key? A primary key is the key that is unique, that is used to define or determine all the other columns in this table. So if you look at the order number 100, right? If you, every time we see this number here, if you look to the right of that, all these columns, do you always get the same data, okay? Or you don't, right? If you do, then this is called a determinant or determinant key, but it's also unique, right? That entire row should be unique. That means this row should not be the same as this row here. Okay, so that's the first thing. And you test the first row. Are there any repeating data? Okay, repeating groups here refers to the notion that a cell, this is a cell, a cell should contain only one type of data, right? About this particular field or attribute. So the company name should only contain the company name. So this is okay, legit. The address, it's okay, right? These are fine, but if you look at the items order, you have a bunch of information here. We have a product ID number, the name of the product, the weight, the price, some other descriptions, and the weight more as well, right? So you have all these jumps together inside one cell and that rule fails because the rule says it should be in atomic values only, okay? So therefore we have to fix this problem. And how do you fix that? I fix that by you have to then create or break this cell, it's calmed down into multiple columns and this table so that the product ID is its own column, the name is its own column, the weight and the price or unit price, okay? So now you have like one, two, three, four additional columns can be added to this table so that each cell has its own data. If that's the case, then you can check it again. Now are there any repeating groups and the data? Then if it says no, then this first clause passed. And then we have the second information, right? And so forth, okay? Those are the rules you can test to make sure that everything is satisfied. So it just tells you that repeating data here, okay, so yeah, data is not self-contained, what I just described, I'm sorry. The repeating data, repeating data, repeating groups here, it just means that if you look at this, right? Look at this, every row here. The first row I have the company name is repeating here. I have it again at one or three, it's repeating here again, right? So when I see that, not only that, if you look at the customer number, it's also repeating here and here, okay? So I'm looking at it and say, hey, this customer number tells me that the name of this company is this AMC, ACME company, right? So again, I can know this company name by looking at the customer number. If I know the customer number, I can know the company name. So in this case, these two columns are repeating, right? It's repeating data and they don't have to be in here all together, so therefore you have to remove one of them. Which one should be moved? Should you remove the customer number and keep the company name? Or should you remove the company name and keep the customer number? So that is something that you have to decide, but you should not keep both of them, okay? That's repeating data. And then this is the atomic values, okay? So those two are violating this table, therefore they're not normalized, okay? So that's why you have to break them apart. And here is the example of you break that into a customer table, and then you create their customer number as the primary key for that new table. You move the name there, you move the address to that table as well. And then in the old table, you drop the, so we decided to drop the customer name and the address because they are related. So these three are related. We drop these two. We keep this one as the foreign key and then we move that to a new table down here and then we reference only by the customer number. So this customer number here is now the foreign key, okay? How do I get that information? Well, if you look at the other table, you join, right? I'm gonna do join, right? Select everything from the orders table, join the customer table where the customer number equals customer number. You get that information, right? If you can do that, then that's the way to go, okay? So we resolve the first problem. The second problem is now we have some of these values that I mentioned earlier. So what you're gonna do is we're gonna split this into multiple columns and then now we have something that looks like this, okay? So we have, you know, initially it was, this whole thing was one big cell, we break it apart and this gets into a one, two, three, four, five, six additional columns to that orders table. Okay, so now it satisfied the rule, right? The rule says every cell must be unique, right? It must have a, now you need, it has a single value type. Now they are and there's no repeating data. That means that this column should be unique about I shouldn't have another column that can tell, you know, that give me clues to the other columns, okay? So if this is the case, then we also have a primary key. And we decided that, can this be a primary key? Okay, remember a primary key has to be unique, right? So we actually cannot just keep this as a primary key because if you look at it here 100, it's repeating, right? Primary keys cannot be repeated, right? It has to be unique. So therefore the order number itself in this case is not, cannot be used as the primary key. So what you're going to do is you have to decide, hmm, do I want to combine this primary key with another column, either the date column or the customer number column? So if you look at the, if I combine the order number and the order date as a primary key, remember you can use two key columns as one to create a primary key. We call that a composite key because they have more than two columns. If you look at this, you know, 101, 102 or 102, it seems unique, but then it determines, it's repeating, right? Repeating here, repeat here again, again, that word doesn't apply. So what about if I take order number and the part number, can that be unique? Maybe, maybe not, why? Because if you look at this number down here, right? It's possible that the same order number can have, well, I mean, I guess it's possible, right? So, you know, if you think, which column or columns can I combine to make a primary key? And you can combine multiple columns. I can combine all of these four, five columns and make this a big primary key. It's not ideal, but you can, right? So if you think about that, is that better or should I create another additional key that doesn't exist in this table? When you create another additional column, I mean column here, we call that a surrogate key, right? A key that doesn't exist in the table here. If you create a new one, that is called a surrogate key and that could be maybe the order ID. And then that will be unique for everything and then that will solve the problem, right? So that's something you have to think about. And so here, and so we, what, and the doing is, there's something like I mentioned, add another column called line number, right? This one doesn't exist, we just call it line number. And then now we combine this line number with the order number, now you make this unique. So there's no duplicate here at all, one, zero, zero, one, one, zero, zero, two, and one, zero, three, right? Now these are unique and it defines everything in this columns, okay? So now we decided that, okay, we're gonna make this as the primary key as a composite key, thank you. So now that solves the first rule. And you move to the second rule. The second rule says, the must meet the first rule. So it has to be true. And then each column must depend on the whole primary key. Okay, so here's the primary key here. If you look at the order number, does it depend on that? Now before I go any further, we'll come back here a little bit later. I wanna take you to another page and look at some of these terminologies first, okay? So over here, on this unit seven, there's another link that takes you to a database design guide, if you wanna follow along, there's find, it's the link here, that will open this table over here, okay? So let's go over here and look at some of these things again. We already talked about this a little bit, but the rule we design databases, is that when should you normalize tables? And yeah, I'm just showing you here, usually when you are presented with a spreadsheet or some really raw data, that contains some data that is not in a table or a very, very broad, large data set, like a text file or spreadsheet. If that's the case, then you have to normalize it, and you have to create a table or tables for these data, then you have to normalize them, okay? That is usually when you should do that. This is another one as well. You can extract some data from existing databases, right? The same similar process. Here, some people give you a spreadsheet or some text file and you build a database. Here you're obtaining some data from an existing database, a collection of from different places, and they give you some raw data. You'll also be in the form of a database, and then you have to redesign that as well, okay? So similar approach. And then other ways to design data, we did this before. This is when you design a new system or new database that has no data. When we did our example, we used an ERD, right? And that's how you do it. So in this is the case, it's much simpler to do that because we don't have any data to think about. All we know is, okay, I need to create table for a user, create table for a game, create table for a level, things like that. So you know exactly what you want already. So usually going for that approach, your database are usually already kind of normalized, okay? So it's much simpler that way. The harder part is when you have some data to play with, okay? So this is another design. Another one would be, you are updating some existing database, okay? You go from here, you migrate into another system or design, then you have the option to either make sure that they are correct or re-normalize them, okay? But the most common is of course, he has some raw data. So here's some terminology again, the term table and the terms table and relation here, it just mean the same thing, okay? It's just different way how you say in different contexts, but it's the same thing. So you can see a table here is like a relation, it's a file, similar. And a column is also known as attribute and this relation type and the field in the file, like a spreadsheet, right? And then a row and a table is called a row and a row and the relation is called a tuple and then a file spreadsheet aligns called record, right? So they all mean the same thing, you are not familiar. So make sure you understand that, okay? So down here, some important relational terminologies. I already put some definitions here for you. I did not do the last two here, but last three, I'm sure pretty simple, okay? So relation is a table, a functional dependency, we'll look at that in a minute. It's when a value of one or more columns, attributes here is a column, determine the value of another attribute, a column, okay? That means that it is dependent on the other columns. A determinant is a column or a collection of columns that determine other columns, determining, we'll look at some math examples, you'll see a little bit better. And then based on these determinants, when you find them, they can be used as what's called a candidate key, right? A candidate key is not a primary key, it just give you some idea that, okay, this table has three or four possible keys that are candidates for a primary key, okay? So a primary key is really a candidate key. And that means a candidate key is also used as a unique key, okay? So you have, if you have a couple of them, how do I decide which one is which one? You have to pick one, okay? And that one that you pick is called the primary key because it's the primary key that used to be uniquely identified each row, right? And then if you don't have a primary key, I mean, you should have it, but if you don't like it, you can create one yourself and we call that the surrogate key, okay? So it's an artificial column that is not included in the table where you add it there just so that you have a primary key. And then foreign key, we already know this, foreign key is a column word key that is linked to the external table, it's primary key, okay? And then referential integrity here, it refers to some constraints. We'll talk about this again next week or the week after that about some constraints that you have. And it has to do with the foreign key. So when you have a foreign key and another table, those two tables have some referential integrity constraints meaning that you cannot delete one table from the other. If you want to do that, you have to do it in the correct order. I kind of showed you last time how we add some data into our tables and I tried to delete some other records and it would not let me do it, right? Because one table is dependent on the other table. When that happens, we created something called the referential integrity. Referential means reference, right? I'm referencing another table data. So when that is tightly referenced or created, you cannot break the bond. You have to do it in the correct way, okay? Normal form, you have talked about that already, is normal form, isn't it? A table is in the three, three and F, but they're normal form. And multi-value dependencies. This just means that a column can be used as a dependent for other columns. If I have like two or three columns that rely on another column, then we call this multi-value dependencies. If that's the case, then we want to prevent that from happening. So it's just some technologies. Okay, so here is another rule for relationships and relations about a normalized table. So all rows contain data about that entity. Again, the entity here refers to actual object of that table. Columns contain data about attributes of the entities. It's kind of similar, right? So all the columns should be about that object described. And all entries in the column are of the same kind, right? Again, if you enter a column called address, it should be an address. It should not be about a company name, right? So it should be of the same type. So if you think about in terms of programming, it's the same data type, okay? Each column is unique, unique name, right? You don't have the same ID and ID twice. Cells, so each cells must hold a single value. A single value of the same type. Like I showed you earlier, like a phone column cannot contain two or three phone numbers, right? Should be only one phone number. And the order is nine points. The row is nine points, right? And then no two rows may be identical, meaning no duplicates, right? Again, so this rule applies to what you call the third normal form, okay? So all your database table should follow this rule. If anyone of these rules is violated, it is not normalized, okay? So you can use this to check when you normalize the table. Down here is the example of, let me make it bigger. Actually, I forgot to do that, sorry. A normalize table, okay? If you look at the table here, you see that all columns have the single data. We call this atomic values or animosity. So if you look at the first column, right? Every data, every cell has only one type of data in there. Here is the comments. So of course it's a text, but it is about one thing only. We don't have like phone numbers or email ID, right? Suddenly, okay? So that rule holds all rows contain about that entity. In this case, it's about employee, right? It's about the individual employee. Is it? If it's not about employee, it validates this rule. So if you look at this employee, Jerry Johnson has an ID number of 100. Jerry is in the accounting department. This is his email, his phone number, and it's a comment about him, right? So they all are about this Jerry. The next question as well. So you can see that all these columns are about that individual only. So that must be true, okay? If it's only have like here, if I have another column called computer, right? And you have like a Mac and it's only have a PC or something, you know, that computer has nothing to do with Jerry because I can switch that to a different computer, okay? Has to be something that can be fixed and or about that person only. A computer could be another thing that Jerry owns or is associated with. Something that is extended from Jerry then that's the case is something external and we call that a foreign object, right? So we have foreign key to map to that item, okay? So all entries in the column are of the same kind. So again, email address are only about email address. We don't put phone numbers in here, okay? The same thing for department and so forth. So that will close, close. And then I put a comment here saying this column here, the last column, we don't have a lot, you know, a lot of data to look at. You might have a hundred employees that all the comments might be included. But here, if you look at this, we have only two people that have comments and the other six, okay, one, two, three, four, five people do not have any comments. In this case, you could, you know, move the comments out to another table and then just reference that or we can leave it as is, it's fine too. Okay, so it's optional, but the comments are about that individual, well, you still, okay? So another one here, another is normalized, similar as above. Only difference is we took out the comment, right? So if you remove the comment, it's actually much cleaner than the one above, okay? It's still the same. And then here is an example of similar above now it's not normalized, okay? So why? Because we have some rules that is broken. So if you look at Tom's information, he has like three different phone numbers and the same cell that violates this rule, right? It says cells should contain only a single value and this one fails. It's also down here, right? Richard also has two numbers. So these two rules, these two cells violate this rule. So you have to fix that. You can either remove and choose only one or what if you want to list all three numbers or two numbers? Well, that's the case, you have to add another column like phone one, phone two, phone three. What if you have three columns for three phones but that only one person has those or two or three employees have only those two phones and the rest don't have them. So again, you're wasting some columns, right? And that's the case, what do you do? Okay, something to think about is another example of irregularities. So again, same idea. All entries in each column must be of the same kind, right? So we had the violation here that the email address column has some facts in the home text in here. Okay, so this is a text only but we store some irregular data in here. So therefore, these are not normal. Not only that, if you look at this data here these are all empty, why is that? Why do we keep this empty cells? Doesn't make sense, right? So in this case, you have to normalize this table. We did this so that we can, you know, fix the last column so that the phone numbers can have its own cell. Right, this is fixed now but then we have issues over here now, right? So in this case, how do you fix it? Okay, so a solution for that would be to create a third table, right? I create a third table, I can call it phone. Actually, I didn't give a name, I call it phone table. And then you have each has a phone ID that is associated in there. And then we add the employee number in here. This is like foreign key, okay? So I put here, okay, so we know it's foreign key. So that foreign key now points to that employee whoever 400 was, I think was, let's see, 400 was Tom, right, Tom has three numbers. This is like an office number, fax number, home number, okay? So same fax and home. So we add in here, so Tom, Tom, Tom, the first phone number matched the phone ID number one is an office number, the phone, actual phone number and the type is office. The second number is fax and third is home, right? And then this person here has office and home number. We have another one here as well. So we can also list the other as well but also those only have just the phone number. And that's fine, right? I didn't list all of them but if you were to add like folks, I'm a richer, right? The last one, 700. And then you put here the phone number ID, okay? So it'll be like, actually richer as two from different table, okay? So we can see that by removing and creating a third table, we solve the problem. Now this table is in third normal form if you think about that too, right? So we have, this is the PK, the ID is the PK here. So that is unique, okay? Unique is there's no duplicate in that column. And now we see that it has a primary key. This is the foreign key and then these are now defined by the primary key and employee number. It makes it unique, right? Okay, so let's look at the functional dependencies. What does it mean by a functional dependency? So it is when a value in one of the cells, okay? One value, if you just look one value at a time, okay? One value or more determines the value of another attribute. So to think into perspective of the math problem maybe it makes more sense. If you look at this example, I've pulled this one in the book by the way and explained to you again. So we have a cookie cost is equal to the number of boxes times the price of that cookie or box, right? So in this case, we say that in order for us to know the cost of the cookie, right? We really have to know these two numbers. Assume that number five is a fixed number, is given, is fixed. Then in order to know the cost of the cookie, we have to know the number of boxes. So we say that the cost of the cookie depends on the number of boxes. Functional meaning that cookie depends on the function of this because this number can change, right? If the number goes up, the price goes up. If number goes down, the price goes down. So if you look at the graph, it looks like a graph that goes up and down. So linear graph, that's why it's called function, right? Functional. So therefore, this cookie cost depends on this number of boxes. So we call this, the cookie cost is functionally dependent on the number of boxes, okay? And you write that in the program in the book, they use the arrow to do it this way. So the number of boxes, okay? The arrow here means determines the number of the cost of the cookie, right? So it says the number of boxes determines cookie cost. And we call this number of boxes a determinant because it determines the cost, okay? So that's what those term determinant means and functional dependency means. So another example is like a grade in the classroom, like in this database class, right? So Jeffrey plus database, you take the class by the end of the semester, you can earn a grade, okay? If you just look at yourself, we cannot associate with you with the grade. What does that mean? If you look at the class by itself, we don't know either. Class got an A, so what, what does that mean, right? We don't know, but oh, this class here taken by this student got a grade. So now we have some information. So therefore, we say this grade is dependent on both the student and the class, okay? So we have a dependency here again, right? So we say this, the student and the class, if you read it the other way around, right? The student and the class determine the grade. The grade is dependent on the student and the class, okay? So we have this functional dependency here as well. Here we have two of them. So we call this the composite determinant. Composite here means like two or more, as opposed to a single determinant, just call it determinant, a composite, all right? So using that idea, if you think about in the table column, one column for grade, one column for student, one column for class, how do you know which one is the determinant, right? Can the grade be determined? Does the grade determine the student? No, we don't know that, right? Does it determine the class? No, it doesn't, right? And if you look at the next one, if you look at the student, if I look at the student alone, by using the student's name or ID does not or name, does that mean, does it determine the grade, right? It doesn't because it doesn't do anything. It has nothing to do with that. The class itself again, doesn't, but if you combine together, then you do have that, okay? So this is a math problem, it's also the same rule when you build a database or you normalize a database table, right? So here's an example from the book. You look at this. So we have something that has some data to play with. When you normalize database tables, if you're given a set of data like this, okay? You have to think a little bit further, further down the road where you have a lot more data. Whatever is given to you, what you see may not be just the thing that you're used to normalize a table, it may not be correct. You have to ask some questions like, okay, can for example, you look at this one here, like the buyer, the buyer is Pete Hansen, Pete Hansen. If you look at this, Pete Hansen is always in the sports, a water sports department, but Nancy Myers is also in the water sports, right? So we know that, okay, so a department can have different buyers, okay? So can a department have different buyers? In this case, yes, so those are the business rules you have to ask and then make sure it works in that way. You can ask the other way around. Can a buyer be in more than one department? So Pete is only in water. Every time you look at this road, you see Pete Hansen, his department is always water sports, water sports. Down here, I don't see it. I look at Nancy, she's also in water sports only. Well, look at Cindy, Cindy is in camping, camping, camping and then Jerry is in climbing. So you can see that you can assume that the buyer can only be in a single department. You cannot be in two departments, okay? So that's something you wanna ask the person who actually gave you the data, say, so can a buyer be more than one department? If they say, yes, if that's the case, then you have something to think about, right? If they say, no, there's only one, one buyer can only be in one department, but they can be in the same department, like Nancy and Pete, then again, some rules you have to think about, right? Okay, so if you look at this example here, the number on the left, just some counters, we ignore that. So we have a SKU number, and if you look at the SKU numbers, are they unique? So if you look at this, every cell, right, they appear to be very unique, okay? So no duplicates. So it's possible that this could be a primary key and then you have the SKU description, these unique. If you look at these, it appears to be that they are unique, right? You don't have any duplicates in here. So no two items have the same description. So it's also unique. If you look at the department here, you don't have uniqueness, right? So this row has both sports and they're repeating. So we already know that this cannot be used as primary key because it's repeating. Same thing for the buyer, right? It is not unique, so in that case. So you look at the first SKU number here, you ask yourself a question. Every column after it, or before it doesn't matter, can, does this information depend on the SKU number? Okay, or does the SKU number determines the description? Like metromethylite, does the number of boxes determines the cost of the cookie? Okay, well, does the cost of cookie depend on number of boxes? So that rule, does the SKU number 1, 0, 0, 1, 0, 0 determine the description of this product? You're thinking about it, say, well, let's see if there's another one down here. And if you look at this, there's no 1, 0, 0, 1 down here. It's unique, so we can say that yes, it does describe this one because this description depends only on this SKU number. I don't have the same description that is coming from a different SKU number. If I do have it, then this rule fails, okay? So we can say that this SKU description depends on the SKU number and the SKU number determines the description. Okay, so this one here has, what about the next one? Does the SKU number determine the department? Okay, so by that, I mean like, if you look at every time in this column, every time we see this 1, 0, 0, 1, 0, 0 and every column here, every row, if you can go over to the department, is it always the same department or not? Okay, if it is always the same, then we say that this one determines that column, okay? So if you look at this, since it's unique, we know that it already determines the department. What about the buyer? Same thing, there is no duplicate, so that rule is easy. We can say that now this SKU number determines the description that also determines the department and the buyer. Okay, so if that's the case, then we also say, make sure it's true, right? Is it true the other way around, right? You have to check with everyone. So we know that SKU number is a possible candidate for a key, okay? And then you look at a description, same rule. Just if I see standard Skuba tank yellow, if you look at the SKU number, it's always gonna be 1, 0, 0, 1, 0, 0. Since it's only unique, it's the same rule as SKU number, right? It determines that one. Look here, it's always what a sports is unique and P is also unique. I don't have any other copy. So this is also a possible primary key, okay? And then you go to the next one. So what I'm doing is I'm determining all the determinants, okay? So I put a note in here, let's see. Look here. Number one is determine all determinants, okay? All right, so we determined that SKU is description and also determines the department. Also determines the buyer. So we learned that, right? And the SKU is also the same thing. So we say, SKU number, SKU description is also determined that, determined the department and also the buyer. So these, right, is unique. So now we look at that department. That's the department called the same rule. If I look at, if I see water sports, do I always see P, Tencent? This, yes, yes, yes, yes, yes. And suddenly no, right? It's not true here because I see Nancy now. So you can see that this department has not determined a buyer, I guess. If you look on the right side, I see a yellow, yeah. Yes, that's true. But the next one here is different, right? Here is different. So again, that's not determined description. It does not determine the SKU number as well because it's different every time. Notable to speak true, it must always be the same thing, right? Every time we see the duplicate here. So we know that the department does not determine any of those. So this one actually failed, right? Let's say that. What about the buyer? See, let's look at the buyer. So that's the buyer determine the department. If you look at Pete Hansen, you always see water sports. Pete water sports, yes, yes, yes. Nancy water sports, yes, Nancy water sports, yes. So you can see that all these buyers, you always see the same department. So if that's the case, we say that the buyer does determine the department. Okay, what about the description? Can they determine description? If I see Pete, I see the yellow scuba. I see Pete, oh, I see magenta. No, right? That rule failed because it's different. Okay, so it doesn't determine that SKU description. What about the SKU number? Again, it does not because the first Pete here is 1, 0, 0, 1, 0, 0. The second occurrence, it shows a different SKU number. So we say that it does not determine the SKU. So in this case, the buyer only determined the department. Okay, so we have these possible. So this one doesn't work, we can pass that out. How does that get, right? So we can move that. So these are the possible determinants. Okay, these are determinants. The SKU description and the buyer. Okay. So if you look at this, the SKU determines all three of them. This description also determines all three of them, but the buyer only one column. It is not described the first two column. So we know for sure that this is not a possible for a primary key, right? Primary key has to be unique for all the other columns. So again, SKU number, if you look at this down the list, right, it's all unique, there's no duplicates at all. There are no duplicates, it's a potential for a candidate key. The SKU number is also unique all the way down. So it's also a candidate key. So we say these two are candidate key. Okay, this is also a candidate key. What does that mean, candidate key? Candidate key is potential for a primary key, okay? So this is not, this is not a candidate key because it does not describe, it does not determine all the other columns. So only these two here. So now we have to decide which one should be used as the primary key, okay? If you look at this, then we know for sure that this is a better option for a primary key, right? It contains just some unique numbers, as opposed to some text, okay? So this is, you know, it's okay. You'll see that it's possible, but it's prone to having errors because I might accidentally type, you know, yellow or something twice in here or I might run out of options, right? Because, you know, colors are there's only so many colors in here, okay? So the better option will be a skew number, okay? So this is the best option for a primary key, okay? So that's how you determine, how do you determine determinants and also using the functional dependency process, okay? So by notice formation, we got some information here and then I put some information as well down here and then just some more example how to design databases using the ERD. Okay, so using this rule, we're gonna come back after the break and we'll do example, how we can normalize an unnormalized table. So I'll see you guys in 10 minutes and we'll continue, okay? Would you be able to talk with me for just a little bit in a breakout room during this break? Thank you. Okay. And we will begin. So now I'm gonna go over over here and if you click on this link, the unnormalized sample table. And so before we look at this, I'm gonna look at it. On the day, on the day, I want to go right now. On the day, on the day. Yeah, I clicked on it and I can see it. Okay, let me give you another link first, give me a second. It's in Google Sheets. All right, I'm gonna give you a link in the, well, I'll be, it's okay, I'm just gonna show you here. Where is it at? Okay, here's an example of another possible type of data that you can use to normalize. I'm not gonna normalize this one, but I just want to show you that an example of some data you can receive from a customer or from your company for whatever reason. If you look at this table here, this is a sales order, okay? If you get this type of data. And, you know, and, okay, I'm gonna just do it. Okay, so if you have this type of data here, right? And you are a database designer, and you're going to, you know, create a database for this, what would it look like, right? So a common sales order, you have the company name, the address, the email, if I remember, then you have the customers information here. We have an address for the customer, and we have a shipping address for that customer as well. Maybe the same company, maybe a different company. We don't know. All we know is that we build this person here and then we ship it to this address. And there's some information about this invoice, number of the date of this invoice and the customer ID. So this ID here refers to this client down here, right? And there's some information about the sales person who actually made this transaction with the customer. So Michael here is from the sales department and the shipping information is here. And then we have the actual items or products that have been sold to this customer. And then we have some, the unit cost, discounts with some items, the total cost for each of those items. Down the bottom, we have the tax, the total discount. And then we have the subtotal, the tax rate, and then the total cost down here, okay? So something like this, if you look at this, I put a link down here as well, and I'll show this with you later, that you can actually, where to build a table, a single large table for all this information, okay? It will look something like this. So I merge that over, I create a table here. I have just a number, it's just some counter number here. The first order, for example, the invoice number goes right here, the invoice date, the vendor, it would be like the comptile source company name. And then we have the address for the company. As you can see, it's all in one box, one cell. So normally, again, you want to break that out. So it's quality valuable number one. But for now, we have the vendor's phone number, which is this number here, the email address goes here, and then we have the customer ID. That's just the one right here. And then the customer name is different here. And then the company name and so forth. And then we have the shipping person. It's this record over here. So I can scroll that to the right and I move that home as you can see a little bit better. And then over here we have, you know, the shipping company information goes here. On the right side of that, we have some more records going. So it's a really, really long table, right? You can see that has a lot of columns. All the way over here, we get to the very last few columns, we have that information about the product, okay? So we have the item and quantity for the products here, the item number, a description, the enterprise discount, and then we have over here the total discount and it's online because we cannot put that into that row, right? So you put that here, you add these up, you're gonna get a total here. And we have the subtotal, and then we have the text rate and then the total over here. So as you can see, it looks very ugly, right? Really not really useful. Why haven't all these different, you know, all these cells are empty. What do we do with this? Okay, so, and then you can go a little bit further and design that and do something like that. So now we have the same order number, just different line number, line one, two, three, four, five, because it's person ordered five items. It's in the same invoice number and the same date, same vendor, same company. So over here is repeating data, as you can see. Duplicate data until we get to all the way over here to the products information, right? So we have the product quantity here, that amount, number, and then the cost, right? We did all of these just to satisfy these items here. And then over here, we still have some problem because now how do we deal with this, right? So then you can see some problem over here rising. And then you can, I can go a little bit further down. So now that I know this information, I can then classify them this way, okay? So anything in white here, anything in white columns, will be in the invoice table, for example, all the turquoise color can be moved to a table called vendor. It's all about the vendor by the seller, all in the yellow box over here is about the customer address, billing address. So both of these are actually about the customer, but they have a billing address and they have a shipping address. So we can have like two, actually, a customer table and then you have a shipping or an address table out of that and you link that to the type of addresses, right? Either a billing or a shipping address, okay? So you see now that can be broken down. And over here we have the sales person has only about the sales person here. We can add more information about the sales ID, sales person ID and so forth here. Again, anything white can stay in the invoice table or be on the right side, keep going down further and we have the products information. So we have the, you know, everything in the red box here will be under the order detail table maybe. So we have the quantity and the discount prices. And then we have three columns here can be classified and build a product table just to contain the product, right? So each product has a unique number. So this could be the skew number or added number doesn't matter. And if you look at this table here, it's unique, right? It has the item number, the description and it's unique price. The discount price is given, it's not in the product table because it's just based on season, right? Season or maybe different rules. So that'll be in a detail table, okay? And then the other ones over here in gray are known as derived data or calculated totals only because these can be calculated based on the quantity and in unit price, okay? So we do that, that these are just calculated so therefore they should not be included in the table. Okay, so these gray ones should be removed, right? The sales tax stays because it will stay the same or it may change, but these are dependent on just the unit price and the quantity. They can be calculated so we don't need it. Same thing for the total discount. This total discount here is dependent on the sum of all the discounts, right? This plus this plus everything here you get 36. So this is calculated. Subtotal is also calculated, right? You just add all these up and then subtract the discount and get the subtotal. And then the total is also calculated. So you can see anything that is calculated should not be on the table, okay? So this is an example of how you would build this sample order into this table here, okay? So that's just what I wanna show you though. So now let's look at a database design where we can normalize, not this one here, this one here, okay? If you look at this table, it's an employee table. There has some information here. And again, we're gonna look at those rules. Which one of these columns can be used as the primary key, right? So in this example, it's pretty obvious. We really know the employee ID will be a best candidate. But if it's not that case, you have to test it, right? You test the same rule that we test earlier about the functional dependencies, okay? So again, if you look at the ID for one, two, three, four, fine, if you look at, so there's only one per row. So that's okay, that's neat. If I see this ID here, well, I always see the name Jane Doe. In this case, yes, right? Well, I always see the same data booth. Yes, so it does describe this employee, right? So it determines the name, determines the data booth, well, I always see the age in this case. Yeah, so in this case, if you look at same ID here, there's only one record for each person. So you can say this ID determines all of these information about that person, okay? Same thing here, when I see this ID, it only determines Jane John Smith, okay? So that is a potential for a candidate key, right? It's the determinant. There you go through the next one. Well, but what about the name? If I see Jane Doe, well, I always see one, two, three, four, five. In this case, yes, because all the names are unique. What if I say, what if I happen to, you know, change a forest gump to, you know, Bob Bucker? Another person can have the same name, right? You don't know that. It could be the same person, it could not be. I can have two Jeffries, right, the same name. So it's possible. So the name is duplicates here. So these, yeah, Jane Doe is fine. Until it get to Bob Bucker, it determines that Bob's ID is seven, eight, five, four, three. But down here, I see Bob Bucker again. This time I get a different ID, I get different formation. So therefore we say that the name is not used. Can I be used to determine the ID, right? So this column fails. Can I use to determine the date of birth? No, because when I look at Bob Bucker, they have a different date of birth. So can I be determined? Can you determine the age? Well, Bob and Bob, six, nine and 45, no. Can I determine the department? No, different department. They're not the same, they're not the same. However, it does determine the pay code, right? So Bob, pay code, same. Bob, pay code, it seems to be the same, right? So we say that, yeah, the name can be used to determine the pay code, but only for in this case. What if I have another person like Peter Parker, and then, you know, and Bruce Wayne is also Peter Parker, right? So if that's the case, then they no longer have the same pay code. So you can see the problem here that the name cannot be used to determine any of these columns, right? So you go to the next one, same thing, right? You look at this date of birth here, this is always determined the name for Jane Doe, an ID here. It's possible that this person, like Al Bundy, could have the same birthday, right? There's a lot of people who have the same birthday exactly the same day, same year. So in this case, it's also not possible to use this to determine, because I could have a different age, different HR department, different title. So this is not used as a determinant. Age, same thing, right? Age can be any, because there are multiple people can have the same age, and they may not be the same department, they don't have the same ID, okay? HR, if you look at the department, HR can have multiple people in the same department, like IT, for example, if I see IT, do I always see 18, no? Do I always see 7, 7.2001? And this table, yeah, because you need, but I think about a million records, right? It's possible that you don't see the same date of birth, because I don't see here, right? It's different. So again, it has to be exactly the same in order to determine that column. Can I see it Peter Pan? And this is also Peter Pan? No, it's not, so it fails, right? So again, the department is not possible. Can I be used to determine all the other columns because they are different, okay? Title, same thing, manager. If I see manager, do I always see HR? Down here is, there is not, so it already fails, okay? Do I always see the age 32, two year, 55? No, failed. Do I always see the same birthday, 123? Here is 123, here is not, so failed. Do I always see Jane Doe? No, I don't, it's L Bundy, right? Do I see the employee ID? So again, the title cannot be used to determine any of these columns. What about the right side? Manager is SA, manager SA, hmm, appears to be, right? Appears to be SA, SA means like a salary, HR is hourly rate, okay? So appears to be that way, so based on this data alone, we can say that it's possible. But if you move it down here, electrician SA, I don't have a lot of information here, but it's possible that an associate is HR, I can have another employee is also an associate who also has a salary as well, okay? So it's also not possible to do this way. Then again, the pay code cannot determine all the other columns, because they're all different. The same thing over here, okay? So in this case, only the employee ID is used, can be used to determine all the other columns. If I do 12345 and Jane Doe, if combine this together, well, I always see the same information, in this case, yes, right? So if I combine both of those, I can use those as primary key and determine all the other columns, that's fine too. So that is also a potential for a composite key, right? But since this already, a single column can already be determined, used to determine all the other columns, we don't need additional columns to use as a primary key, okay? So in that example, then what we're gonna do is we're going to, so we're gonna say that this column, I'm gonna duplicate this, okay? I'm gonna duplicate this. And we're gonna say this column, is gonna be the primary key. So I'm gonna add this as the primary key, for your PK, and I'll mark a different color so we can tell that it's gonna be a, oops, we use that as the primary key. And this column is now the primary key. And I'll change the back to a selected forest column, okay? So now it has a primary key and we don't have any redundancy data in here, right? So this is what's called a first normal form, okay? One and F. It says, first one point says employee is the backend with a primary key. All the data are only unique, right? About one type has only the ID as the name only, the date of birth, the age, like the primary is HR. If I have like HR and you put comma, IT or something, then that's not right, right? Because I have two different types of data, or if I have like an age, some kind of a special code, what is that, right? If that's the case, then this rule does not apply. Since I don't have it, everything here, every cell is, I had the single type of data so that rule is whole. And so now we have finished what's called the first normal form. For example, in this example here, I'll put red. So now let's think about the next one. What about the second normal form? Second normal form says that we have to look at repeating, are there any repeating data in here, okay? Are there any information here that is not dependent on the primary key? That means that they have to be dependent, have to be about that employee, right? Remember a table has to be only about employee. If you look at this one here, we see that the employee is from here to here. These are about employee, okay? The department here has some information is repeating. Oh, not repeating, but it's possible that it has some problems here. What if, again, back to the rule, if I ask to give some new data to an employee and they enter some record in here and enter some information and instead of a news, they call it news department, right? So you can see there's an error already here that had news and news department as a title, right? That is potential error that can also have to be avoid. Not only that, if you look at so if you change IT department, suddenly I say, oh, you know what? We don't call it, you know, IT anymore. We call it something, you know, something IT, right? IT dash web or something. Then if you have 100,000 records, you have to manually go inside each of these department, each of these records and find all the ITs and replace them. Yeah, you can do that using the update, which is fine. But you can see all this repeating data can be used in here, right? A lot of repeating data can be avoid. IT is not a big deal. What if customer service, right? It's a really big name. And if I have a hundred of these, imagine number of characters, I have to put in two of those, okay? So those could also be changed. What if I change the pay code instead of, you know, S-A-S, I change it to S-A-L, like salary. So I have to go through all these here, I change those salary as well, okay? So you can see some errors in here. So what I'm gonna do is we want to break this out because this information is not only about that employee anymore, right? So the manager is the title for this employee, but it's not about Jane Doe. You can look at manager. Manager could be somebody else, right? Could be somebody else. The pay code, if you look at that, this has nothing to do with Jane Doe. It's just a pay code. The pay code is really dependent on the title, right? Maybe the title of the manager or whether the person is working full-time or part-time. There's other misinformation here as well, right? And that's something you can look into. Look at the description here, okay? Just some text about each individual person. So that's fine. So if you look at this, you can say that this entire four columns over here, okay? They don't really describe that employee. Because again, if you look at the employee, it's like as a person, okay? What does it have? The person has a name, the date of birth, an age, right? An ID to make it unique. But HR, they can change, right? Something that can change, they usually don't belong there. I can upgrade to manager. I can be downgraded to an associate, right? That is not always stick with Jane Doe. The pay code could also change and description could change. So therefore, this is something else completely outside of the employee itself. So therefore, we have to think about that and we have to move them out to a different table, okay? So again, the rule is that if I do change HR or IT, I have some problem. So what do we do? We're going to change this to a different form. Okay, so when I duplicate this now, let's change this because you'll want to end up here. So we're going to duplicate this. So this is going to be in the 2 and F, I should point out. So this time, what are we going to do? I'm going to remove this. So let's do the department first. The department can be moved out into a new table and what do over here, I guess. So I'm going to copy this. Actually, yeah, let's copy this column. Let's call it over here. I should go over here, okay? And it will have a department ID. This would be the primary key and we'll put it here to some numbers, like one, two, three, four, five, how many departments we have? HR, facility, IT, customer and admin. So we move that to the admin and then we have news. And we have marketing, right? So these are the five unique or six, seven unique tables, positions, so six, seven, okay? So we have a table that is now can be used to classify these department. So that if I change IT, I change it only one place. I don't go here and change IT and in many places. So once I do that, this department becomes the EEPT ID as a foreign key, okay? And then the HR and maps to the ID of one, all right? So this would be one, facility is two, IT is three, customer service is four, IT again, that's a three, admin is five, news is six, then six, admin is five and then marketing is seven, okay? So now you see that once I do that, I can access the actual name of the department via this foreign key, because the one always maps to HR, right? So if I have to rename HR to like something else like human resources, the full name, then you can see I only change one place. I don't have to go here and do that a hundred times, all right? You see the potential or the power of that, right? Put it back, that's how we do that. So the same thing for the title, because I could change the title manager and then have to replace everywhere else. So the same idea, I'm gonna copy this table. Let me remove this two table here, let me delete, clear, let me clear this delete. Okay, so I'm gonna copy this. Only one, two, three, four, five, six, seven, eight. All right, so I have eight of them. Copy this, we're done here. And we'll call this the department table, okay? And this is the title table. So it'll be the title as the PK ID. And then this is the title. And we have manager, I'm gonna copy this right in here. And we'll do a, okay, so six, seven. Manager, electrician, associate manager, this is the director, this is CEO. We have the order, then we have the photographer. And before that, after that, we have the VP for president, every manager is being, okay? So we have about six as well, eight, right? Six, seven, eight, okay? So those are the sales and the motion, okay? So again, so we go back here and let's see the title ID as the FK. So the manager is one, electrician is two, associate is three, director, the manager is one again. Director is a four, CEO is five, our reporter is six, and then photographer is seven, and VP is eight, and manager again is one. So the same, similar to that of the department ID, right? So you can see that we break that up into a different table. This is now, this is the primary key, this is the primary key, oops. And then these two are now the foreign keys are gonna copy and change the font to, I say every key goes yellow, okay? So those are your foreign keys. And do the same thing for the pay code, right? Only three of them, so I'm gonna make one, just, I think it's three, so we copy this, I'll just get one, two, and three, put it right here, right below here, this is the pay code, go pay ID, this is the code. So we have the SA and the HR, and you don't have another one, but we can create one for like undefined, just in case we have one, right? It might be on a, not only but on a contract basis, for example, those be undefined, right? So then we have the SA would be just one, one, two, one, one, one, one, two, and one, one. Okay, so again, this would be the pay ID as FK. We color that, and then now, so now it looks good, right? So these are only foreign keys that are linked to external tables. So this is called a parent table. No, actually, these are the child table because these are now parent table. When you, when the primary key of a table is referenced in another table, this table is called the parent, this is not the child table, okay? Because this table depends on the department, the title, and the pay key, pay code, right? So it's the child table. So we had that going on, and this is supposed to be the two and M. Okay, so that's good, we can leave it as is, if that's, okay, that's fine. And, or we can also, you know, move this out to another table if you want to. It's just a description, or sometimes you can just look at a comment. Will always be the same, may or may not be, because, you know, a director of IT, IT director, I can, you know, if I'm another IT director, I can call it differently, it doesn't matter. So it's like a comment only, you can leave me here, that's fine. Or you can move it out if you want to. Okay, that's entirely up to you. If you move it out, then you can do the following, the same rule. So we can leave it out or we can do, we can move it out. If I do move it out, then it'll be something like the following. But remember that this comment, has to be about the employee, right? So we have to reference the employee ID, because this comment only is only about Jane Doe. This is about John Smith and so forth. So if you do that way, you have to know where this comment is coming from or going to, okay? So if that's the case, then I can create one and over here. So we create, maybe I'll copy, I'll copy three of all, I need three of this. That put over here. That's it put over here on the right side. And the first one will be the description ID. So put here, description, oops, ID. That's the primary key, okay? So make it a bit bigger. So this would be the primary key or change of color to that. This is the employee ID, this is the foreign key. So employee ID. And then this is the comment, right? So this would be one, two, three, four, five, seven, eight, nine, one, two, three, four, five, six, seven, eight, nine, 10. Just a unique ID, right? But these are the employee IDs, okay? So that means that I have to go over here and copy their IDs. Let me go and copy this, it's easier. And put it over here. So now we see that each employee is added to a different table and I can make it sort of, all of these are, let's change the font to be, let's start foreign key data. And then over here, we have what's called the description. So if I do that, I'm not linking here. So this is the child table now, right? For the child table, as you can see, the child table has the foreign key. But the parent table, don't have the foreign key. So in this case, I don't need this column anymore, okay? So I'm gonna delete this column and I'll see that these are all foreign data. So I'll make those pink and then these are the parent table of the description. I take that whole thing wrong. Yeah, this is the description, description table. So I'm gonna have four tables, right? Five tables. So based on that ID, this ID here, I can know the description. So we join the employee table and the description table based on the employee ID that I can get the description, right? Join the employee table on the department table, I can get based on the department ID. ID, I can get the name of the department. Join employee and title, I can get a title based on the title ID. Okay, so now you see that this now has a primary key, determines the department. Primary key determines the title and now this primary key here, right, determines each individual employee, a unique employees. So now what I have here is actually a two NF. So it's satisfying that rule. Okay, because each table is now unique. All the data are only about that employee, that person, right? The ID here is all about the department. All these are about the titles, right? They don't have anything else. These are all the pay code for the pay table and these are descriptions about that description. Again, this data here is a foreign key. So you have to reference that. So that's fine, give it in here. But this foreign key is also related to this description table, right? Okay, so you can see how now these tables are now very normalized to the second normal form. So this is already third normal form, okay? So these three here, these three here, let me highlight this, already I'll copy this, I'll highlight it and I'll make it really big and blue or something that I'll shape this. Actually I won't shape it. Okay, so this is already a third normal form because it already satisfy all the rules. All the data are unique. They are about that table only and there are no duplicates, okay? So this table is already satisfied. So we can leave that as is. So now we look at this table here again, this is the raw table data, right? Now, what do we think about here? Okay, we look at this name here. I see Jane Doe, John Smith. This is not efficient. So what we do is we want to break this into multiple columns. Not only that, you look at this age here, remember this is a calculated number, right? The age here is determined based on the current time. They should be about the current time, right? Unless the age is based on the hiring date, then that'll be different. When I was hired, I was 32. Then that's different. But again, when you were hired, you don't put the hiring age to put the hiring date. So based on the hiring date, you can also still calculate how old you were when you got hired based on your date of birth and the hiring date if you had a date column. So therefore this age column is not needed, okay? So this is calculated column. This has to be removed. This also has to be cleaned up and add additional columns like first name, middle name and last name if that's more efficient. Okay, so that is something we're gonna do. So I'm gonna duplicate this and we're gonna name this, it's gonna be the third normal form. So our final state will be by changing this column. It's gonna split this column into two. And let's see if I can, how do I do this? Insert to the right, it's okay. Okay, so we're gonna add another column over here. We're gonna name it to at last. We'll keep this first. And if you decide to have a middle name, you can add another column there too, okay? But I'm gonna just use two. Okay, I put Jane and then Bill over here. So John Smith, Peter Pan, right here. And then Al, Bundy, or Cat, Bob, Parker. Always playing Peter, Parker, Bruce, and then we have Boris come. Okay, much better, right? And then we don't need this column. So then I delete that and boom. So now we have what's called a third normal form. So they're all on third normal form. So in this case, we'd say this is a normalized database. Okay, so everything is about that employee, right? Every row is unique based on this column here. There's no duplicates. All the fields are atomic, okay? They have the same data type in here. And all these are reference keys, we need those because we need to have some kind of relationship between the two, a direct relationship. Otherwise, how do we know which department they belong to? So these are just foreign keys, so they're fine. And then these tables are already in third normal form. Okay, so you see that now from this unnormalized table, okay, this is useful, however, it's useful when you want to generate report, right? You can still generate this same data from these five tables, okay? Because now this is normalized. So they're normalized the data you would never have the problem when I enter a new employee in here, right? I had this information. So I look at the employee here, I answer an employee, let's say I answer employee like one, nine, nine, three, four, five, and this is like a J, J, Leno. His birthday is I say 12, 23, 19, 56 or something, okay? Which department he belongs to. Let's say he is the director, so he'll be the H, he's in the news, so he'll be in unit department number six, okay? And then he's entitled, he's a CEO, he's a director, so four, and he's paid his salary, so he put one, right? So that's all I need to do here. So I don't have to go and change here. And I can put a comment here if I want to about J, Leno is number one, nine, nine, three, four, five, and his comment is talk show post, right? So we have added a new record to the table, and so we don't have to touch these tables at all. It'll always be correct. As opposed to if we had to do that over here, I have to go here and put like, was it again? Copy this, I think copy this over here, I put over here like this, right? J, Leno, and then 12, 23, 19, 26, okay? At the age, I don't know what to say, all right, 70, 60 from old age. Let's say something like that. And he's a director, he's got news, and he's a director, something like that, I say, and then talk show, something like that, right? So I could potentially mistype Moose Debt, and then I put here like director, I could just do that. You see some problem here, because I have to manually enter this information to these columns. And because of that, I have some accurate data, as opposed to if I do it this way, I just put the correct code number and the name will always be correct. If I have to change the title, I have to go into the title table and change those. If I change it, everywhere else is also changed, right? Same thing for the department, and same thing for the pay code. Okay, so you can see how efficient this is when you do data entry. You avoid any of the problem that you may have. Okay, if I happen to delete like Jay's record, if I delete Jay's record here, I can just delete from here, but I can't do that because if I do that, this table depends on Jay Leno's record. So if I try to delete it, it's gonna crash, because if I happen to delete that, then it doesn't have this ID anymore. So this ID here is attached to that ID, attached to that ID over there, and therefore it's not allowed when you try to delete, okay? This is tied to Jay Leno's ID. So in order to delete this, you have to delete this first. The order, if you do that first, once that's been deleted, then I can go ahead and delete from here, okay? So that's the order of deletion. And so the data you have to insert in the other order, insert employee first and go to the comment description. If you want to delete something here, you cannot just like, okay, I wanna delete marketing. You can't do that because marketing has been used in here. Okay, so if you do that, you have to go into the child tables, remove all the data first before you can remove from the parent table. That is the rule and also, this is also why it's so important so that your data are contained together by using what's called this term up here. Up here, I mentioned it very top. This word called referential integrity constraint, okay? Because there are reference and another table, I cannot accidentally delete the department here without going to all the other tables that depend on it and delete all this information. If I have to delete this record seven, I have to delete Peter Parker information out so I can, no, I mean, delete forest grump so I can delete it. It's only one that use it. If I delete news, I have to delete both Lois Lane and Peter Parker out of that. All the information has to be deleted in order to delete that. So in a way, you're protecting your data just in case you accidentally delete them, okay? So you have that referential integrity constraint. So same rule applies all this information here.