 Hello and welcome. My name is Shannon Kemp and I'm the Executive Editor for DataVersity. We'd like to thank you for joining this month's installment of the DataVersity Webinar Series, Big Challenges in Data Modeling, moderated by Karen Lopez. Today, Karen will be discussing super typing and subtyping with guest speaker Dr. Gordon Everest. A couple of points to get us started due to the large number of people that attend these sessions. You will be muted during the webinar. For questions, we'll be collecting them by the Q&A section or if you'd like to tweet, we encourage you to share highlights or questions via Twitter using hashtag BCD Modeling. Big Challenges in Data Modeling. As always, we will send a follow-up email within two business days containing links to the recording session and additional information requested throughout the webinar. Now, I'm going to introduce today's guest speaker, Dr. Gordon Everest, Professor Emeritus of MIS and DBMS in the Carlson School of Management at the University of Minnesota. With early retirement, he continues to teach as an adjunct. Besides teaching about databases, he has helped many organizations and government agencies design their databases. His approach transfers expertise to professional data architects with those organizations by having them participate in and observe the conduct of database design project meetings with the subject matter experts. He is a frequent speaker at professional organizations such as Dima. As you already know, our esteemed moderator, Karen Lopez, Karen is a senior project manager and architect at InfoAdvisors. She has 20-plus years of experience in project and data management on large multi-project programs. Karen specializes in the practical application of data management principles. She is a Microsoft SQL server MVP specializing in data modeling and database design. She is an advisor to the Dima International Board and a member of the advisory board of Zachman International. With that, I will turn it over to Karen to get us started. Hello and welcome. Hi, Karen. Thanks. Hi, Gore. How are you doing today? Hi, Karen. Just great. Yeah, so we're just going to try something different. We're going to use our webcams just at the beginning, mostly just say hi to everyone and make it more less about just looking at a screen. I wanted to say thanks, Shannon. As usual, we do a wonderful job of getting us all set up for these things. We couldn't do it without you nor without DataVersity and all the people behind that that make this happen. I see you've got down there the enterprise DataVersity training event listed down there where I'm going to be teaching a couple of half day courses and that's how the whole event works. It's not just a conference, it's a series of in-depth tutorials about some data modeling and data management topics and I'll be looking forward to that coming up so I recommend it. One of the reasons why I wanted this topic this month is we've done a lot in the last few months about the process of modeling, of challenges with working with different types of team members and all of those things. I really wanted to get now back deep into actual data modeling issues and the first person I thought of was Gord, not just because he's a fellow Kenyan-ish person, but at one point in time, just like all of us should be somewhat, we need more Canada in the world. He's done a series of these presentations sometimes as a double session at the EDW conference. It's sending room only when he gives it. He does a good balance of theory as well as practicality and today I've asked him to share some of his slides from these presentations. There's a whole bunch to go through. We're definitely going to try to take those questions. I'll be looking at Twitter and in the chat, but if you really have a burning question or an interesting question or you just want to say something, please put it in the Q&A part of the panel that you have for the webinar platform so that we see it. The thing is that yes, the slides will be made available. Yes, there will be a recording and if you please use that BCD modeling hashtag when you tweet it. I'm going to turn off my webcam now so that I can focus on doing all of those things. Thanks, Gordon. It's up to you what you want to do. But I think I'm going to go ahead and give you a presentation so that you can go ahead and start with your slides. There's the opening slide. Hey, it looks just like your picture. There you go. At least I've got a suit on there. Just for your interest, you were saying how long you've been involved in data processing. I programmed my first computer in 1960 and we had an LGP-30 made by Bendix. It was a very absolute code, no assembler code, no languages. It was a great fun coding machine with just ones and zeroes all across every line of your coding sheet. I think we're debugging that. We've come a long way since then. In this talk, I'm going to assume that you've got that the attendees have a decent amount of knowledge about data modeling. If you don't know anything about data modeling, you might be lost here. I'm not going to talk about tools. I will spend a little bit on notations that you might see in some of the tools. I'm not going to talk about implementation except for my last slide where I get to the question of how do you reflect this stuff or convert into tables because we don't talk about tables when we're talking about subtypes and supertypes. It's something that has to precede that. The tools that you do use, oftentimes that will form your opinion about what subtypes and supertypes are all about. I need a fuller understanding, however, to recognize the limitations, especially the constraints, and to properly use the constructs in our own data modeling. We say I call this the most valuable construct. You all know that there's entities, and there's attributes, and there's relationships, and there's keys. Beyond those, there's the minimum of subtypes and supertypes. There. The fundamental, just as background, the fundamental assumption in data modeling is we work with entities or objects. We'll draw a box in our diagrams, and that box represents an entity type. We'll put a name on it, define the structure, but most importantly, that box indicates a population of individual instances of something. We tend to group into types or into these populations. It's essentially arbitrary. How do we do that? That's an imposed view by the designer. The world isn't naturally that way. And one of the task assumptions that's always made in relational systems is that those populations are strictly disjoint. They're mutually exclusive, non-overlapping. In other words, an individual cannot belong to more than one. You may think that they do, but the system is always making the assumption that those individuals are distinct. Okay. So is this always true? If we get an example of employee, customer, and shareholders, are those three distinct populations? Well, you can tell me that a person could be all three of them, right? Are there likely to be three files or tables in your organization for these three? Yes. And the question is now we have this impasse, so to speak. How do we model these? Well, we're going to talk about now. So types and supertypes, that allows us to formally represent overlapping populations. That's the key idea behind subtypes and supertypes. And remember that every individual member of a subtype is its supertype, all of its supertype populations. I don't plan on spending a lot of time unless somebody asks about multiple supertypes, sometimes called multiple inheritance, because most of the tools don't offer that. It lends some complexity that is further than we need to go to date to a proper understanding. Okay. So here's an example. I'm showing different roles that are played by members of a common population. Another approach here is to model different states of an entity over time. Those are two common reasons for why we want to use subtypes and supertypes. In the bottom of the different states, it may or may not be that an order can be in multiple states at the same time. One of the things I was taught long time ago is that subtyping this way based on a life cycle of something, and these are sort of dates, but generally orders kind of have a life cycle to them as well. I thought that this was not a great way to do subtyping. Any ideas why that might be? Well, the issue, are there extra attributes that you want on a thing in any of those states? Obviously you can define a record that's going to include all of the information that you would be using for it in any one of those states. That could be unwieldy. Subtypes and supertypes would be the way of dealing with that. Okay. I'm going to hit on something a little bit later too, and I think most of the objections to this, and most of the objections that people get on supertyping and subtyping is really focused on the physical implementation of them. So I don't want to leave that until the end, but I just wanted to bring that up, is as I'd like for the audience to really think about the rules that they've been taught, think about why they might have been taught that, and I think a lot of the rules that I hear from people, either their consults telling people, or what I see in books, or something really about physical implementation issues with subtypes and not so much about the logical modeling of them. Yes, and if you're not talking about implementation here, the last slide that I'm going to use will be talking about how do you form, when you build the tables, and when you're beginning to get at the question of physical implementation. Let me say the most important thing that you must keep in mind always when thinking about subtypes, supertypes, is that you're always talking about populations and you're looking at individuals that are in different populations. So it all has to come back to the question of populations. Okay. Yep. So there's really two things that are valuable in the use of subtypes, supertypes. Number one is we formally represent overlapping populations, and number two is we can defer the question of where to build the tables. And if we start thinking about tables too soon, we're going to get into trouble. Okay. So it's common to think about a subtype, supertype as a relationship. It's tempting to call it that, but it's not a relationship. A relationship is something that's between members of one population with members of another population, and those could be the same population in a recursive relationship or reflexive relationship. It's always between different instances. So we say that a subtype, in a subtype you remember the supertype, you're saying the subtype is, so we say an employee is a person. Okay. In fact, if you think of it as a relationship, you're thinking of the one thing in the subtype is equal to the one thing, same thing in the supertype. Well, does it make sense to think about modeling the other thing that a thing is always related to itself? Okay. How do we think about using subtypes and supertypes? There are basically two ways of recognizing when this can be useful. And remember that we're always still thinking about entity populations. The realization says, if I look at two populations and I observe some commonalities, then that's justification for introducing a supertype. That would be a common thing that relates to the members of the different subtypes. That would be considered kind of a bottom-up view. Specialization is where I'm looking at a population and saying, hey, there's a subset of the members of that population to treat differently or especially. They apply a constraint or have a different attribute to be mandatory or have additional relationships, whatever. Okay. So, generalization and specialization. Some people call with generalization abstraction, and let me just call it out. Abstraction, in my way of thinking, is different abstraction. If you look at the Webster's dictionary, it says it's hiding things from view. We're talking about hiding when we talk about generalization. We're talking about finding commonalities. Abstraction is actually an issue of presentation, not an issue of modeling. Whereas generalization is an issue of modeling. And if you want to call generalization abstraction, I'm not going to call generalize it. Well, anyway, let's... I'll go over some. Two rules. And two principles must always be true. Every subtype is... must be a subset, potentially, of the supertype population. If it was not a subset, you wouldn't call out the subtype. And every subtype must inherit all of the roles of the supertype and have additional roles. So in this example, we say a person has a name and a birth date, and that was the basis on which you would come up and say, there's a commonality across employees and shareholders, et cetera. A person has a name and a birth date, and an employee is a person, and we add the positions... the attribute position and salary. And then we say a boss is a special case or a subset of employees. And they would have attributes such as organization unit and budget organization. So we see that we can begin to build... you might think of it as a hierarchy, but it's not going to be a hierarchy. It can be broader than that. So we're getting some questions and comments about where does the sort of data modeling pattern of party and party role sit here? So we're getting some comments about aren't really subtypes of roles and not really subtypes of people. Well, think about the population. The population of party is going to be lots of different things, and you're going to want to establish one of the attributes of the person is going to be a role. And I can establish subtypes of party on the basis of the roles that they play. And that would be perfectly consistent with what we're trying to do here. Right. So you can talk about discriminators, as I call them, in a couple slides. I wonder if the reason a lot of us go to the party role thing is also influenced by the fact that... and you might be talking about this later, about tools, is that most data modeling tools only allow an entity to be a subtype of only one supertype at the same time. And that role allows someone to be both an employee, a customer, and a shareholder at the time. And also a married person, an unmarried person. You know, all different reasons. Persons definitely are a great candidate for subtyping things, because we as society have all kinds of data that distinct that we keep about types of people. Right. Yep. Let's perhaps bring up the question of a shareholder. Could an organization be a shareholder as well as a person? And the answer would be yes. So, people don't have a shareholder as a subtype of both person and organization unit, because remember, the rule is every subtype, every member of the subtype, must be a member of all of its supertypes. All of them. Right. Yeah. Must be. Because when you inherit all of those characteristics, a subtype is a subset of each supertype population. Must be. Yes. That's the definition of subtype supertypes. I will answer the question. Yeah. Maybe as you go through the rest slides, you'll be answering that question, I think, of why you see the subtype. Let's get toward the end, and then we'll see if people still don't have an understanding of that. Okay. So, as we move up this hierarchy, let's call it a hierarchy, we just have more interest instances or larger populations have moved down. We have more attributes. That's just a handy visual to recognize what's going on here. Okay. So, how do we diagram this? There are basically two ways. One way is the nested using an Euler diagram. This is intuitively very good because we can see now an employee is a person. The boss is an employee. We've added shareholder in here. It's clean, it's compact, it's visually intuitive. However, it is not very good in representing complex cases, particularly when we get to represent constraints on these subtype supertypes. Okay. That's one way. The more common way is what I call the separated one. We have separate boxes for the person, for the subtypes and the supertypes. And then we have a heavy arrow. You can pick whatever you want, but it's not a relationship. It's a heavy arrow that indicates that MPI is a subset or subtype of them. It's easily less intuitive. It's more cluttered in the diagrams. It makes it a lot easier, as we'll see, to represent constraints. I'm just showing you how it's an extended ER or an IE or IDF-1X typically. We'll see a little bit more about that. So, these are the two ways that I'm going to focus on the separated. Okay. Let's talk about constraints on these subtype supertypes. First of all, a constraint should truly be a constraint. In other words, if you don't have the constraint, you have the more general case. And the general case is that we can have overlapping populations and that every member of the supertype does not need to be in some of its subtypes. Okay. So, the constraint on the first one, if it's not overlapping, then we have to declare an exclusion constraint across two subtypes or more subtypes. You could think of that as the atmost one. You can be in atmost one of your subtypes. Is that what we sometimes call mutually exclusivity? Yep. The subtypes are mutually exclusive. Right. And certainly in some of the tools, there's some ways that that's expressed. And sometimes it's more of a property of the subtyping, you know, in that in your tool. And then other notations use the arc notation. Do you have a preference there? No. No. That's not asking. It's okay that you don't have one. Which communicates better. And I guess I'm not going to make a judgment about that. Okay. All right. So, I think a lot of people have been asking for this arc notation in tools because it's actually on the diagram, whereas when it's a property of the subtyping, then it's really sort of hidden in the metadata and there's no visual way. And this is tied to IDF1X. IDF1X, which we're going to talk a little bit more about that too, doesn't have arc notation. And tool vendors, because they want to be IDF1X compliance for government work, don't dirt that. And I think that's been sort of a tough position for the vendor product managers because everybody's asking for it, but not part of that standard. Okay. I have to ask what you mean by property. So, I'm using that as a generic term. So, in the modeling tools, you know, you right-click on something and then you choose a property of it. So, in IDF1X notation, usually that mutual exclusivity, depending on which notation you're using in your tool, sometimes it's just a property of the subtype. Sometimes it shows as in the subtype on your diagram and sometimes it's just something you can report on. Well, if you're using a property and applying it only to one subtype, that doesn't make any sense because we're talking about exclusion across two or more subtypes. So, the property of one subtype, what I have to say is that a particular subtype population has mutual exclusivity with respect to another subtype population. In the supertype population, it cannot be in both. So, that's what I'm trying to say with this constraint. Yes. So, that's what I mean. So, when I say it's a property of the subtyping, I'm really thinking of the subtyping object on your diagram. So, again, I'm talking about how tools implement this, not databases or anything. So, it's how the features of the tools and how they've interpreted some of the... I think one of the things we're going to keep talking about here is how different tool vendors have implemented subtyping and how that ends up being a challenge for us. Let me clarify this a little bit. You're talking about a property, and it's really not a property of the subtype. It's a property of the supertype. In the supertype, you say... Yes. Yes. This thing can be an employee and an employee and a shareholder at the same time, and therefore it's not exclusive. Your word is a property of the supertype. It's... The word is using subtyping, so not subtype. I'm trying to avoid the word relationship, even though all the tools tend to call it a relationship. See how this gets all convoluted? It's actually something you right-click on the subtype so the lines between the supertype and the subtypes, and you put the property there. Well, that makes sense if they're doing that because it has to be with respect to two of the subtype supertype lines. Yes. It's in the... Idaf1x, of course, does the separate entities. You put all the properties for the subtype on the single line. You know how they split it up. The single line, but underneath... I'll say underneath the supertype. So there you put it. It still ends up being a constraint of the entire set of entities. Okay. Got it. And that in itself is part of the problem, is that it's an all-or-nothing deal. And that's an unnecessary restriction in the notion of subtypes and supertypes. That's why I think it's going to make that clear. Yeah. Okay. Now, we declare constraints on the more restrictive cases. So I'll say that in the first case, you must be in at most one. In the second, the constraint on the second one is you must be in at least one. Okay. And now we're saying about the members of the supertype population. Okay. So basically those two, there's really a third case, but I'm not going to get into it. So the exclusion, if you don't say anything, we're assuming that you can be in any number of the subtypes. If you want exclusion across them, then you declare the exclusive constraint. In this example, I've got man-woman-child as being subtypes, and I'm saying that because we draw that dotted line across those three arcs, that says that you can't be a man or a woman and a child at the same time. You can only be at most one of those. And then the other constraint with the dot in the circle, that's the totality constraint. It's sometimes called a covering or a dependency mandatory, whatever you call it. That says that every member of the supertype must be in at least one of the man-woman-child. Okay. So here's another example. And this is a very real example, and it's where we're showing that it's not the same across all the subtypes. In case we're saying that this is animals, is the supertype, the subtype is the oviparous mammal bird and fish, and we say an animal cannot be both a mammal and a fish at the same time. Okay. So put the X between those two, and notice that the X only involves two of the subtypes. Say you can't be both a mammal and oviparous, and we put an exclusion constraint between those two. We know that B is a subtype of O. In other words, birds are a subtype of oviparous. Oviparous is egg-laying, I believe. I'm biased. The point to be made here is the declaration of subtype constraints across only some of the subtypes doesn't have to be an all-or-nothing thing. That is a limitation of your tool, if indeed that's the way it's done in the tool. And in the EER notation, that's the only way it can be. For all of them, it's that way. Help. There's some notations. I've got the two, the exclusive or overlapping, and then the total and the partial. And I know what it's done, how it's noted in these five. The other two that get it right are object-role modeling and extended EER, due to Tobi-Tori. So let me tell you a little bit what extended EER is, because I think most of us aren't familiar with it. Oh, really? Yeah. Okay. If you look at Tobi-Tori's books, and he's gone through several editions, this is Tobi-Tori at the University of Michigan. He introduced the notion of an extended EER, and it was simply EER extended with subtypes and supertypes. It is well-added to it. I guess I just have to go on now. Sure. Do some research on that. Anyway, so you can see with IE, that's the one that's information engineering due to Clive Finkelstein. He doesn't say anything about total versus partial. There's no notation for that. And the notation for external overlapping is put an X for exclusive if they are. Notice that it has to cover all of the subtypes because there's no other, unless you have two of these little half moons underneath the same supertype. I've ever seen that, and that would really get confusing in the graphic. Yeah, you can do that. So one of the things I wanted to point out here is that a lot of modelers don't understand the fact that so most of the modeling tools these days let you choose a notation, so you can choose information engineering or IE notation by F1-X. And IE-F1-X is, you know, most of the tools support. But the thing to think of it as this just changes the cardinality indicators on the end of relationships from circled dots to crow's feet and boxes. And that's a regular relationship, right? A relationship and makes some other sort of, you know, aesthetic changes, right? They think of it just as choosing a different notation. But one of the things that happens under the covers and tools is that once you switch your notations or choose your notations in the model, if you choose IE notation, they would go to set the properties of a subtyping. You can only get those two choices of mutually exclusive or non-mutually exclusive. And then if you were to change it to IE-F1-X, you'd only get, I think, a covering and non-covering or something like that. And so it's common that I work with models that have been worked with for a long time where teams tend to use IE notation for the logical model and IE-F1-X for the physical and they don't even realize that they've got this metadata part of their model and the data is still there in your model, but you're only allowed to see and report on and play with one set of these constraints. Yeah, so this chart here should make it clear. IE doesn't have, at least the original IE, didn't have any representation of partial versus total. And if you then want to convert to IE-F1-X, you don't know whether to have the double line under the soul or not because you've had information. On the other hand, and for the other side, IE does distinguish between overlapping and disjoint. In an IE-F1-X, you only have the disjoint case. You cannot define overlapping populations. A lot of people will take that and say, oh, you can't define overlapping populations in subtypes. Well, that's not true. That's a good point, too, because they've only... So I tend to enjoy working with the IE notation more, maybe because I find it easier to talk to business people. I seem to understand the curve's feet. I'm just choosing it mostly for that cardinality expression. And that's just a personal preference of mine. It's not a best practice or a standard. I find that when I start talking about these different sort of constrained properties of subtyping, people are only familiar with one set, and they don't understand that in order to get all of what they've said out of their model out of the tools, sometimes they have to switch notations and do a report, then switch notations and do a report. If somebody's writing a tool, they need to look at this and provide both options for both of these characteristics. Yeah, and fairness... In fairness to the vendors, they get measured on their compliance to the external standards, especially IDF 1x. And like any tool vendor, you don't want to be out of compliance with it. The IDF 1x standard doesn't have these. So you don't want to support it, but they hide it, so it's still in your metadata. You just can't see it. Okay. I understand the dilemma. Maybe what they need to do is to get on a standards committee and extend and do the standard right. Yeah. I've been there. I've been on several standards committees over the years. And you need to have a good understanding first before you get into the standard. We've got standards that are weak, don't get all the cases, et cetera, and we build our tools to conform to that. And we do have the kind of confusion that we're talking about here. What I want the listeners to get here is we have these two characteristics and we have constraints to define on them. Notice that object role modeling has the correct default, the less restrictive case. As a default, so it doesn't require any special notation. Notice that EER, extended ER, those are basically a three-level logic, three-value logic, because they have a D or an O. What do you do in the notation if you don't know? Well, I have a question. Can you explain ORM a little bit? I know you have whole presentations on that as well, but I'm sure, I think I've shared with you in the past that I learned data modeling using NEOM as an experience, so quite a few decades ago. But maybe you could explain what ORM is, what object role modeling is. Well, I can say that NIAM was, what Scheer-Neisen came along with. Back in the 70s, I actually taught a course with him over in Europe back then. And he went to Australia and met Terry Helpin and together they published a book, Prince Hall, in 1989, where they changed the name to Object Role Modeling because they had extended it from just binary relationships to entry relationships. What was the big difference? So, you'd say that Object Role Modeling is the fund to NIAM. The thing that's really different about Object Role Modeling is that there is no notion of an attribute. You don't think about an entity building a table for an entity and sticking in attributes. The problem with that is that we often do that incorrectly. A solution to find the mistakes that we made is to apply the rules of normalization. And the solution is always, if you violate a normalization rule, what you always have to do is to decompose the record. In other words, you put something in there that didn't belong. In Object Role Modeling, we took things in tables, at least at the outset. We just defined objects and relationships. That's what it's called, or M, Object Relationship Modeling. You could think of it. So, we defined objects, we defined relationships, and then we made that an object had an attribute and an attribute is an object that plays a special role in a relationship with another object. So, norms in the definition of an attribute. Objects and relationships before you can ever talk about attributes. So, why don't we just talk about objects and relationships and forget about the word attributes and forget about tables? That's what Object Role Modeling does. And consequently, we never have to worry about normalization. Excellent. That's good, because I refer to it as fact-based, not debate. It is. It is fact-based, but what's a fact? A fact is a verb with one or more objects. You could say a person smiles. It would be a unary fact. It's got one object and one relationship. Relationship always expresses a verb and object to express its nouns. And if you find that rule, then you can always make sentences out of it. What last calls a bitch? Sounds like a made-up word. Yes. Okay. You can continue on now. Okay. You can see this for the models. I find it interesting that UML, which is often titled as the ultimate modeling scheme, it doesn't even have any notation for subtypes, supertypes. You can press it in English labels. Okay. So the next thing we talk about is a member of the supertype population belong in a subtype population. And when we define the rules for that, we say we have well-defined subtypes. It's always the definition of membership in a subtype population is always based upon the attribute or attributes in the supertype. And it's the characteristics of that relationship between the object and the attribute is to determine what the constraints are that would be on the subtypes. So in this example, I say a patient. A patient has a gender or sex associated with it. And the relationship between patient and sex every patient must have a sex recorded and there could be at most one that is recorded for each patient. You can't be both male and female at the same time. And it says there could be multiple patients that have sex. Okay. We understand the many to one and the dependency in that relationship. Okay. Then reflect a membership of the subtypes. Okay. Whoops. Okay. But on the patient must be in the subtype of a particular sex reflected in this mandatory relationship. Can I point her? Choose the pointer up next to where the slide numbers are. Yep, yep. If you do that, this one right here. You can get an arrow and then you can also do the pen depending on what you want to do. Okay. Can you see that? No. Are you seeing something? I've got an arrow now. All it does is change the nature of my... Anyway, let's not worry about that. Okay. So in that aspect, the dependency optionality characteristic of a relationship is what is going to determine membership in the subtypes. So if I say a patient must be of a particular sex, then I'm going to say here put the totality constraint and say a patient must be either... must be in one or more of the subtypes. Okay. So what that constraint says and it's a direct consequence of the nature of the relationship. Okay. So in the next one, we look at the exclusivity. Come on. Huh. That's interesting. Oh, there it goes. Okay. So the fact that a patient has at most one sexes in the exclusivity characteristic that I'll place between male and female. And that says you can be in at most one of those. So we've got to be at least in the most one. This is perhaps an obvious case, but that's how the characteristics on the relationship are reflected in membership in the subtypes. When we can define the rules for membership and you think of that as a constraint on membership, we have what we call an intentional set. The subset is an intentional set. Sometimes I don't know what that is, what the rule is, in which case we would call this an extensional set. And that is that you're a member of the subtype because somebody puts you there, not by any rule. And it would be possible always to, because most systems, if they're going to have defined subtypes, are going to want a rule. And you can always have a rule that says, I'll conjure up an attribute that says which subtype or subtypes you're in. And if you can be in multiple, then it'll be a many relationship to that. Dummy attribute. So I come up with definitions for the rules of membership. The rule of membership can be based upon a Boolean expression across several attributes. Okay? That's an important thing to note. I've got the simplest possible case shown here, where you have a single attribute and a rule based upon the characteristics with that single attribute. We use the defining or the distinguishing attribute. Someone's asked Gordon, sex is just an attribute. Why would you subtype people based on their sex? Can you give an answer to that? Okay. That's a good question. What you're really asking about is why am I calling out a subtype for males? Do you think that in a medical setting here, because there are attributes of males that don't apply to females, and I want to keep track of those. So, you know, PSA and state-of-the-art state and things like on females, it could be the uterus, how many kids have they had, et cetera. Yeah. And even relationships I can think of, right? So maybe, you know, a relationship to a gynecologist in the state would apply to females. Exactly. And so I can have a relationship between this population of things called females with anything else I want, lots of relationships. Notice that in object role modeling, we don't ever talk about attributes, because an attribute is something that you have in a relationship with some other type of object. So gynecologists could be their object, and I have a relationship between gynecologist and females. Everything that you're related to is potentially an attribute of yours, even if it's many. And it would violate first known reform when you try to put it in a table. Yeah. Thank you. Wait, before we move on, because you're doing great here, and you're going to get to more in-depth things. So this comes to the essence of one of the issues that I found with subtyping, is that I have people who don't subtype at all, because it just makes everything so complex. And then I have people who want to over-subtype. And I have a slide for this. I'm not going to show it now. I might show it during the 15 minutes after show thing. But I actually had to work on a project where we had what I called professional philosophers, and philosophy was a great fashion. But everything had to be subtyped. Everything. And not just because they had distinctive attributes or relationships, but because it led to clarity. So we ended up with some typings that went 95 levels deep, and it resulted in statements that, so almost everything in our model was a subtype of an entity called agreement. And people were subtypes of agreements, which I said I could not agree with. But to me, there's this concept of, you need to have a reason for subtyping. And clarity is a wonderful reason to do something in a model. But in subtyping, if people are just subtyping things because they think about inventory really is a type of person if you think about it, or vehicles are really the same things as people because people give their cars names and they have ages. And it's kind of, to me, that sort of a subtyping delusion maybe? What do you want to say about that? All right. I'm going to go to another slide that I've added to the end of this. I can come back here. Okay. It's this one. This is what gets at it. First of all, let me say that the combination of attributes could serve as the basis for subtypes. The combination just implies that any combination of attributes, because we said that the rule of membership is a big expression across the attributes. Any combination, any attribute or any combination could be the basis of a subtype. We have thousands, if you like, in that scenario. So we have to pick those that matter. And they don't matter when we have a sub-set or a super-type population that we want to do something special with. Period. That's the only reason for calling out. We are not trying to build a taxonomy, for example. In taxonomy, the rule for taxonomy is that all of the subtypes must be mutually exclusive and collectively exhaustive of the super-type population. Well, applying both of those constraints, and that just doesn't make any sense. That's a very different question. A lot of people think of subtyping in super-type as building a taxonomy. It is not. We choose what we want the subtypes to be. That's really an important question. So let me tell you what you were saying here. The world we set consists of individual things. It's the population of all instances, no types. I haven't tried to, as a designer, I haven't tried to glump them into anything. So you give that as being the n-types case, everything is its own type, or the no-types case, just individuals. Okay. We're going to get very far with that one. So the first thing that we do is we group them, showing that those dots on the bottom were individual members in the world of some things. And I can group that at the first level and I'll say, okay, I've got products and I've got organizational units and I've got people and I've got invoices, et cetera, et cetera, et cetera. So I will begin to cluster them into groups or populations of things. And then I will find commonalities. I might have at the lower level, I might have employees and shareholders and customers and so I say, oh, there's some commonalities so I'm going to form a super type. And I can keep going up this hierarchy and what you suggested, if everything else falls underneath an agreement, what's defined as the thing that's at the top and I can define the thing at the top. The thing that's going to be a single population, it would be one type. In the OO world, this would be called the root class. Everything is a subtype of the root class. That's what Jeffen called the universal relation. The universal relation is a relation that includes everything in my world. Okay? Now is that helpful? Good theoretical construct to try to understand. But anyway, we're going to look at that universal relation and we're going to say, hey, there are subsets of things that I want to treat differently. Think about in the universal relation how many attributes you're going to have. You're going to call them for every possible attribute of everything that's in that. Is that going to result in a rather sparsely populated table with lots and lots of missing values of course. What would you pick as the identifier? Interesting question because every table has to have an identifier. I think that this really explains, you could build this into your models. You could just say everything and everything. It's just a model person chose agreement as his word for thing. I never did understand that, but pick your battles. We only got five minutes left, Gord. So what part of your original slides now that I've detoured you cover in the next four? I have one slide left actually. Excellent. There's generalization and specialization across those. Okay, so let me finish that out. What if the attribute is optional? Then you're not going to have totality. And what if it's multi-valued? Okay, then it's not going to be exclusive. Okay, so my last slide. Where do we build the tables? I said at the beginning that we build the tables at the end of the process after we've got our well-defined subtypes and supertypes. Well, we have three basic choices. We can build a table based upon members of the population of the supertypes, in which all of the attributes that are on the subtypes have to be absorbed up, flattened up. Use your favorite word. And so let's see. I'm showing here sort of simply the supertype has the key k. D is the distinguishing attribute. And then d sub i are the attributes of the supertype that are common across both A and B. And the a dot dot dot and the b dot dot dot are the attributes of the subtypes. Okay? So that's one way to do it. Another way to do it is to build it on the subtypes only. You could call flattening down or separation. Absorption separation is what James Simpson called them in his book. And what you notice is that we are copying down the attributes of the supertype into each of the records for the subtypes. That kind of redundancy can lead to difficulties, of course. So we can do both. We could store the attributes of the supertype in one and then the attributes of the members of the subtypes in their own tables. Now obviously in this case, we're going to now define a relationship in the relational model between the subtypes and the supertypes. Okay? So the basic choices. Let me also say that you could have any hybrid of these. I can't allow the subtype, whatever, any combination. And some of the constraints that are on this. What happens if A and B are exclusive type only look like a good choice? Probably not. But the subtypes look like a good choice if they're exclusive. What about if it's not exhaustive? What happens if none of these has to be in an A or B? Obviously the subtypes only isn't going to work because I have no table for those that aren't in A or B. I could say what happens if we had a large number of attributes on the supertype and just a couple for each of the subtypes? Probably we'll want to use the supertype only. What happens if we have lots of attributes on the subtypes, just a few in the supertype? Well, probably the reverse. On the option of both in the partitioning scheme, one of the downsides of that is going to be to do any kind of querying, you're going to have to do joining. And joining is the most expensive operation that we have in a relational database. So there's also joining in the second case, too. Let me just summarize by saying this. The main problems that we find in using tools is they do have all the choices of where to build the tables. Including hybrid choices. I could have A and B decide that I'm only going to build a table for the P's and the A's, and B's will be absorbed up into the P's, the supertype, who will be called out as a separate table. That would be an option, a hybrid kind of option. So number one is not giving choice all of the choices. Number two, not representing the full range of the constraints. What's making a constraint view be the default? Or even worse, making it the only choice if you look at that. That's what some of the tools do. And number three, thirdly, is not allowing the default exception of a rule or a condition for membership in the subtype based upon attribute or attributes of the supertype. So number one is wanting to look at the tools and say, what should we do to improve the tools with respect to subtype and supertype? Those are the three questions that they need to look at. And we're using a tool that has limitations. You don't understand properly those three aspects, the constraints in the building of the tables and so on. Excellent. That was all great. And I agree with that. One of my data modeling mantras is your tool influences your data modeling choices much more than you realize. Absolutely. And that's true not just for data modeling, but it seems to have formed consistent the body of knowledge of data modeling much more than it should have, I think, over the years. That's absolutely true. And that's why people need to approach this whole question of data modeling sort of correctly, if you like, without the constraints of the tools that they use. Right. So we've come to the end of our hour, which just means we've ended the formal part of this presentation. I want to thank you, Gord. That was a wonderful overview. I know I've been through a much more extensive version of this. So I highly recommend anyone who has the opportunity to hear him present this on any of his topics. Definitely make a point to the head. Do you have any speaking engagements coming up planned on your schedule? We have a data modeling zone. And I'm going to be speaking at the Seattle chapter on November the 14th. Of the month. Yes. I've spoken at many different chapters. And I'd like to do that interestingly enough, probably the most popular, popularly chosen topic is subtypes, supertypes. Yes, it's very popular. And I think that's the one that's least understood. Yeah, excellent. So I'm going to take this one up on Karen. At the data modeling zone, I'm going to be doing a half-day workshop on object modeling. And Karen will help when we consider a good wife if you like of object role modeling to be there making two sessions. So if you want to find out more about object role modeling, and I'm a real devotee of that, as some people will know. So thank you very much, Gordon. I have to cut you off here so that we can end the recording and then we can go on to everything else. So thank you again. Thank you, attendees, for your great comments and questions. I'm going to try to get some more of them. Shannon, over to you. Thank you. And thank you, Gordon, for this great, great discussion. I think it's, you know, and thanks always to our attendees for so interactive in everything you do. And hopefully, Gordon, you'll be speaking at Enterprise Data World again on this very topic. We really enjoyed the presentation last year. And this year it will be in Washington, D.C., which will be a lot of fun. And let me turn off the recording for you guys, and then you can have your recorded discussion.