 Welcome. My name is Shannon Kemp and I'm the executive editor for Data Diversity. We'd like to thank you for joining this month's installment of the David Diversity Webinar series, The Heart of Data Modeling, moderated by Karen Lopez. Today, Karen will be discussing stuff your database says about me and how to fix it in your data model. Just a couple of points to get us started. Due to the large number of people that attend these sessions, you will be muted during the webinar. And we very much encourage you to chat with each other and with us throughout the webinar. If so, click the chat icon in the top right corner of the screen to activate that feature. For questions, we'll be collecting them by the Q&A section in the bottom right-hand corner of your screen. Or if you'd like to tweet, we encourage you to share highlights or questions via Twitter using hashtag heartdata. As always, we will send a follow-up email within two business days containing links to recording of the session and additional information requested throughout the webinar. Now let me introduce our speaker for today, Karen Lopez. Karen is a senior project manager and architect at InfoAdvisors. She has 20-plus years of experience in project and data management on large multi-project programs. Karen specializes in the practical application of data management principles. She is a frequent speaker, blogger, and panelist. Karen is known for her fun and sometimes snarky observations and data and data management. Mostly, she just wants everyone to love their data. You can also follow her on at data chick on Twitter. And with that, I will turn it over to Karen to get us started. Hello and welcome. Thanks so much, Shannon. Thanks for having me and thanks for plowing through that introduction. And I especially want to thank everyone who's joining this event live, because I think the live event is much more fun than watching a recording or just going through a slide deck. Also, I don't know what the weather is like where you are, but it's a beautiful July day here in Toronto where, as luck would have it, I'm actually working from home today. So we're just going to dive into this presentation. And please do, as Shannon said, tweet and share today's event using the hashtag heartdata. And if you at mentioned me on it, then I might see it and be able to respond to it. So we went through this before. As Shannon went through my overview, I would love to follow you, friend you, or link with you if you send me a request. Just make sure that you mention that we met here at the Dataversity Webinar, because that helps me process your request. So as Shannon also said, formal questions that you want to bring to my attention, please put it in the Q&A. It's much easier for me to see those there and respond to them, either verbally or in writing. But definitely I try to pay attention to the chat a bit. You can chat with each other, unlike a lot of webinars. We have this great community for that. So great way to do that. And yes, the slides and recording will come out sometime next week. So why do I have this presentation? Well, I don't know about you guys, but don't you just cringe when you go to do something with a company and you find out there's a major data quality problem. With your data in their system and their business processes won't let them do anything for you. So I've been collecting these over the years. And while my husband will laugh at me that it's some sort of karma that's due to me that there's always something going on wrong with my data, I don't know that it's just me. I think I probably pick up on those things and then also mention or tweet them much more often. So I tell people, poor customer service people, well, organizations that have poor customer service because they have poor data quality, that when they don't love my data, it means they don't love me. And I recently had an issue with an organization that usually has perfect data practices for me, Air Canada. And they weren't able to check me in because there's an ongoing battle between Expedia and Air Canada where they can't seem to get my name properly moved from Expedia onto Air Canada or other Star Alliance systems. And as we'll see in an example, my name gets concatenated. And that meant they couldn't check me in and that meant I couldn't get on my flight and that meant that I had to be rescheduled for a flight in the morning which had me in a layover going through a route that I hadn't planned. I was on a direct flight and that actually added an extra 12 hours to my flying time. That was something that they worked with to compensate me for but I'm never getting those 12 hours back and that makes me sad. So my statement is your data models and your databases tell me how much you love me and my data and not just me but all your other customers. One of the other things I want to point out as I go through all of this stuff is that any of these issues could start at anywhere in a data model-driven development environment. A lot of them start at requirements. So this is one of the hardest parts about being a data modeler is that sometimes business users will give me a requirement. And yes, it's my job to implement those requirements but sometimes their requirements are, I'll just say, wrong. And then it's hard as a modeler and as someone on the IT side to say let me give you some examples of cases where that business rule won't work. And one of the examples is someone will tell me that the minimum length for a person's name should be three characters. And of course, there are millions of people on the world, at least a million, I'll guess, who have names that are shorter than three characters. There are certain cultures where that's common. There are certain alphabets where that's entirely possible. There's also nicknames that are shorter than that. But I'll still get that. And one of the things we're going to see as I talk about some of these data anomalies is that these examples often happen because someone, an end user, a developer, a DBA, they actually think they're increasing the quality of their data by putting these constraints on the data, but they're not. They're making it worse because they're putting constraints on data that just aren't true. So it could start at requirements. It could happen in the data model because a data modeler or a developer building one or a DBA decides that name should be longer than three characters. It could happen in the database because someone on the application stack decided that all postal codes should be all numeric or that everyone must have a last name or a first name or that everyone knows their date of birth. There's all these things. And actually there's a whole other presentation I have that I'll be doing at a conference in the fall where people believe that nulls are evil and therefore we must put fake data in every column like dates and numbers and names. We must make up fake data so that we can avoid nulls. Then there's the problem of someone going in and fixing the problem. So they've changed the requirements or they've decided that for tuning purposes that zip codes which are always numeric should therefore be integers because integers are smaller and therefore will perform better and they don't realize they've lost all the leading zeros from their zip codes making actually performance get worse as you have to keep parsing and padding. A long transition there. So here are four questions I can't answer about me. What is your phone number? Where do you live? What is your date of birth and what is your name? There's one more as well as what do I do. But I can't answer these because your database doesn't understand me. So I think my name is Karen Lopez that I was born in June of some year a long time ago, that my address is in Toronto and that I have one phone number to give you and that I'm a data architect. I think those things are true about me. But you would not believe the number of databases that think otherwise. So I went through an exercise at one point in time and I realized I had eight phone numbers. Now if you ask most people how many phone numbers they have, they'll probably assume that they have two or three. But then you look at it, you have your mobile, your cell, you probably have a Google voice, you might have some other thing, your iPad has a phone number, if it has a cellular connection, any other connected device that has mobile service even if it's data only as a phone number. I actually have one phone number that I think I have and it's a special phone number much like Google Voice that's a virtual phone number not tied to any device. That's the phone number I want to give you. It also happens to have a 416 area code which is a Canadian area code and I can't tell you how many times and how many systems I cannot supply that phone number to because it's foreign. Here's the other kicker. I had a business user tell me that we would only accept landlines, we would not accept mobile numbers and he wanted to use a service that wouldn't allow that. Well, did you know that more than, and this is in 2012, more than half of American homes don't have or use their landlines. They have virtual numbers. They have VoIP. They have mobile only. And I've created a tiny URL here so you can go read this article. Most of the time, business users do understand that mobile numbers are the new thing but they also try to get local only numbers. Well, because of the advent in the U.S. and North America and certain industrialized nations of phone numbers now long distance being free people don't change their phone numbers. So if they move from New York to L.A., they still keep their New York phone number or people get vanity phone numbers, not that spell out something, but because they want a Las Vegas phone number. You can go get that. Lots of systems have tried to embed intelligence into the area code and I'm here to tell you that that is no longer true. Much like now Social Security numbers no longer have embedded geography in how they're assigned, telephone numbers, no longer can you assume someone's area code actually has any meaning to all to where they live, where they work, what they do, what hours of the day that you can call them. So heaven forbid that you still have an area code or VoIP number assigned to you that's in a time zone, five and a half time zones away because people will look up your area code and assume that. Now do you want your sales guys calling your customers at 5 a.m. to tell them about a great new offer you have to make their lives easier so they don't get paged at 4 a.m.? No. And that's actually happened to me where people look up that 416 area code and say, oh, it's on the east coast, I can call them during east coast hours. And a lot of people understand this. They have their own examples and yet they forget that as they're implementing business processes into an application that these business processes are sometimes based on reality that the world has moved on from. And I will have this argument all the time with people in IT and on the business side when I quote them these stats, people who still have a landline assume that everyone still has a landline. And so it's more of one of the best recommendations I have for not, for loving your customer is that you need to get out more. Someone should tweet that. So here's my example. One of the other reasons why people ask for home or work phone numbers is especially businesses is so that they know when to call you. And it's under this very old assumption that if I give you a work phone number that you can call me sometime between nine and five there. And if I give you my home phone number, that's when you can call me outside of those hours. That's no longer true. And not just because of the mobile phone number thing, but because of how the whole nature of work has been redefined. I worked with an end user once who wanted us to collect home and work so the numbers had to be different and they could not be mobile numbers. And I tried to explain to him that lots of people have home and work numbers whether they're the same because they're self-employed, because they no longer pay for a home number and they only have a personal cell phone number, or because even now organizations are issuing mobile phones and no deaf phones for people. So you don't have a separate work number from a mobile number. There's all kinds of reasons why home and work might be an important piece of data to collect like for telemarketing purposes or debt collection where you're not allowed to call people at work, all important pieces of information to have. But you should no longer assume and your system should not assume that there's actually a lot of significance to the home and work phone numbers other than the cases I talked about. We can't use these to tell us when to call. What we should be doing is asking and developing requirements and processes and data about the proper time, the proper number to call you at a certain time. The other thing is that not all phone numbers are phones and not all phones are phones, so the iPad example. So this is one of my favorite ones coming up. Where do you live? Now, I spend a lot of time in the U.S., I work in the U.S., but I just happen to live and be employed in Canada. So the envelope on the left, this is hilarious. This is an official mailing from the U.S. State Department involving some correspondence about my passport, and it's kind of hard to tell there, but that care of Canada, that's actually a sticker stuck on top of the transparent window on the envelope. What that says to me, and it's all because I had my Canadian address there, what that says to me is that the U.S. State Department doesn't have the proper processes in place to email outside the U.S. You think, well, they're the U.S. State Department, they're supposed to be, you know, it's passports, they're handling U.S. citizen stuff. Well, except for the fact that they handle visas and that Americans do live outside the U.S., and think of the cost of having to apply this sticker onto this mailing because I have a foreign address. The one in the middle is also from an Association of Information Technology professionals, and not only did they have to correct for this, but they just wrote it on top of the transparency, and it turned out to be almost unreadable by the time it got to me, as well as the fact that you'd think if you were a global IT organization, you would understand that you have members outside of the U.S. My absentee ballot thing always comes with my county stamped upside down on my ballot, and I always wanted to know what the significance of that was. So the other big rant that people have outside the U.S. is people who make postal codes, they call the form postal code, but they make it integer for data quality purposes. Well, a lot of postal codes outside the U.S. are not integer, and that means that I end up using that zip code from Hell, Michigan, which is 48169, which also makes for a good fake zip code if you ever need to provide that, even though you would never do that. So what I say about international data, you don't have to do business in a foreign country, your organization, to have foreign data or international data, and just because you support international data doesn't mean your suppliers do. So in the case of the State Department, I'm assuming that the State Department knows that U.S. citizens live abroad, but probably their supplier that actually prints and mails those things isn't set up to do it. That says to me that a good IT solution will have testing of data from end to end, from the time it's collected until the time it's presented to the customer. I have also had clients who swear up and down they don't do business with foreign vendors, and there I am sitting at the table as a foreign vendor, and I often have a hard time getting the checks sent to me because or paid electronically because none of their processes are set up to have foreign or international data. So the next one is, what is your date of birth? You'd think that was the easiest thing in the world, but at one point I went to get my Indiana driver's license because I was in the process of moving to Canada but hadn't moved yet and therefore could not get an Ontario driver's license. So I did what every female does, is I focused only on what my photo was going to be and I managed to get a driver's license with a wonderful photo and then I took it home and showed my mother how wonderful the photo was. Except she looked at it and she said, yeah, except your date of birth is wrong and I looked at that and yes, it was wrong. Well, what happened was is that in my vanity of focusing on my photo, wrong date of birth and I didn't notice it, that they had put it there, so what did I do? I asked them to change it and they said, we can't change your date of birth. Who has a date of birth that changes? Well, here's the problem with that. I needed my date of birth change and they told me the only way I could do that would be to go to court and get a judge to change my date of birth to the right one even though it was already right. So here's the problem with date of birth. They can change. Yes, people don't have their dates of birth change normally, but your system is not the people. It's an abstraction and this is one of the problems that data modelers get hung up on of trying to find the balance of modeling the real world versus modeling actually what the requirements are. The other reason dates of birth change is due to calendar changes and day changes and this does happen. It's rare, but it happens. And many documents might use multiple calendaring systems and therefore people may change how they confirm their date of birth. So a valid date for a date of birth is dependent on many things other than just our calendar and our time zone and our date. One of the other issues here is that the way I had to fix this was to drag in my twin brother, who was the other person in the photo with me, because at least I was in a small town in the U.S. where they knew that I had a twin brother and I was able to bring him in and have him show his driver's license and show that he had a different date of birth on mine. It was actually off by a month. And that allowed them, with that proof, it only worked because I had a twin that they could put in a manual change request for what I'm guessing is a DBA to go in and change the date of birth and then I had to go back and get another driver's license issued. And it was all because someone was overly focused on the true nature of the world versus the abstraction in the system. So even if your business says we can never change this data, if it's data about people, data about people changes. Or it gets captured incorrectly and you need to provide a process for changing it. In fact, under EU privacy legislation, and in some places in the U.S., customers can demand a correction to data and you are legally obliged to provide that correction. You don't have to let them do it. It doesn't have to be part of your natural process but you have to be able to provide that. And I see a lot of systems designed where that is not allowed. And we really want it to be an application process to change that data, even if it's an exceptional one. What we don't want, the last resort should be someone going in and making a change on the database itself. It's risky and it may not ripple through all the other things that need to be updated at the same time. So the next one is my favorite one. What is your name? Oh my gosh, this is the most relevant, the most personal of all information is what do you call yourself? And you can see from these examples, the one there in the middle, that Air Canada boarding pass, Karina. This is the concatenation problem I was talking about. In that example, this is my first name being concatenated with a first initial. And I'm not sure if we're going to talk about that initial problem as well. But when it gets passed to my boarding pass, this is what prints. They can bring up my itinerary and my reservation and my ticket. And it's Karen. But what gets printed on my boarding pass is Karina. Now, who cares what gets printed on your boarding pass? Well, TSA cares and other airlines care. The kiosks and the 24-hour check-in in advance, they all care. And for most of my check-ins, I cannot do that. Sorry, I'm having a technical problem. Karina is not my name. It doesn't match any of my legal documents, and it doesn't match my ticket. And therefore, what's happening is that this data quality problem that I cannot fix, so one of the problems with my last fight that was delayed was that the person checking me in insisted that I call my travel agent to fix it. But my travel agent can't fix it because it's proper on my reservations and ticket. It can only be fixed somewhere in this integration point between the two systems. So I am helpless to fix that. The odd thing is, is that the TSA and other security people don't care because they see this all the time. And you can see that on those U.S. Airways and United ones down there at the bottom is that sometimes my salutation gets concatenated, sometimes another middle name gets concatenated on it. They see this all the time. So I consider this a real, an issue also with security because we are violating something that's supposed to be a security thing of your name matching your boarding pass. The reason I mentioned the two initials is that when I went to get my first passport, the woman working at the passport office in Los Angeles insisted one of my many middle names or all of my many middle names had to be listed on my passport in alphabetical order. Now my middle names, depending on who you ask, are Maria, Maria de Guadalupe, and there's all these middle names that I use. And so I was 20-something years old and I got this passport and it had am, that's the first name there. And I think, who the heck cares what the middle name on your passport are? No, you fast-forward a few decades later and you find out that everyone cares what the name on your passport is because it's used when you board. It was used for my immigration documents in Canada. I now have several sets of ID that seem to have random names on them and it's all depending on what rules they wanted to enforce. Sorry, I want to go back to that slide one more time. This one data anomaly on an official document has now caused me, as a customer of that, many decades of pain. The other thing issue with my name is I have one of those funny names that actually has an accent. So if you look in the upper right-hand corner, you can see a friend of mine emailed me with accent over every character just to make fun of it. But these extended characters traditionally have been ignored in underlying and IT systems because it requires column lengths to be twice as long where you have to use Envarchar or other encoding schemes to take care of them. They have to be printed and escaped and everything. And that means that we are telling people who their names are different because they have these either special characters or special accents or extended characters that it's just not important. And I think that's wrong. And yes, it's a performance trade-off, but what we really are saying is if you have a weird name, like an apostrophe in your name or an accent, no one should care. And one final thing on that slide was that one of my tax property record actually has my husband's last name as my last name is because who in the heck gets married and doesn't have the same last name? And that carried over into that system and in order to get it fixed, I need to go to court, which I haven't done. This is my Starbucks name is Kitty. And by the way, everyone should have a Starbucks name. And this is how it gets spelled. Now Kitty is kind of a well-known name to spell, and yet this is what happens. And it's hilarious because this isn't important, but can you imagine how your call center data is mangled because Kitty is spelled all these different ways. Finally, look at this example. What do you think that name is? That is the name LOPUS after it's been sent from a retailer to a distributor to a shipper to anything, any of a bunch of other things, and it's all been escaped and coded. It has been tortured into being that. Then we come to how you've modeled names. And I've picked on just a data standard from Neem. This is for tracking terrorists and all kinds of things. And they're making the mistake that most of us do where they just have getting name, maiden name, middle name, last names, full names. Names are much more complex than this and we need to understand that. The order of names should not be mixed with the content of names. So format is different than that. Names are one of the more difficult data modeling problems even though people spend about 15 seconds on them. Not all name parts are the same. If any of you have spaces in your name or use prefixes in your names like van or day or van, you know what I'm talking about. Not all name rules are the same depending on cultural differences, postal mailing differences, formality. It's a real problem. And it's not just international data but certainly that makes the complexity of it all much worse. Before I go on I just want to remind you if you have questions you can put them into the chat or QA. And so there are more myths about names than names themselves. This is what I'd like you to quote. I don't think that's really true but that's what it feels like. In fact, there's this wonderful set of blog posts which I've created a tiny URL for for name myths. They are falsehoods that not just programmers believe about names but business users, data modellers, DBAs. And it's a fascinating list of all kinds of things that especially North American IT people believe about names, that people have exactly one name, that people's names fit within a defined amount of space, that they never change but they only change because of the certain set of events that all of them can be written in ASCII or any one character sat. This is common in Asian countries where part of their names are expressed in one character sat and other names in another part. That people's names are case sensitive or that people's names can be all uppercase, that we sort names by the same standard everywhere, that everyone has a last name. So it's funny. If you go and look at your data models, I bet you you will find that you have first name, middle name and last name and that people's data is being tortured in those columns. It's probably the number one spot where our requirements are probably overly constraining like for instance assuming everyone has a middle name. So for instance in Hispanic cultures it's common for people not to be born with a middle name. We acquire lots of names as we go on. But it's common not to be born with a middle name. And yet so many systems, especially if they're official, require a middle name. It's also common in certain cultures for children not to be given first names until a certain point in their life. So what we call first name and the rest of the world calls given names. It's common for people to not have a surname or a family name in Indonesia and it's not just as one vendor told an Indonesian friend of mine, only people living in huts who don't use computers. It's actual people who work and shop and buy things who were never, that were not born with a family name. Now what's happened to all these people, so people like my father who doesn't have a middle name, they either make up names or they put the word none and then all their correspondence comes with none or no nay on it. There's all of these things that happen when you over constrain name data because you're trying to get all names to fit in the world in which you grew up, in which you were born, in which you understand. And you end up with much worse data. The other is, is the messing around with extended character sets and not supporting them. And if any of you have last names like O'Hare or O'Reilly, you know about the apostrophe problem when you try to do things. Lots of people have names with hyphens and those aren't allowed and let me tell you the number one reason that I'm given a lot of times by people is that oh, we have to strip out all these special codes because it's a security risk to allow them. Well, you know what, it is a security risk to allow anything in an input field. But we need to recognize that these are part of people's names, those special characters and punctuation, even though they may not be punctuation, are part of their name. And we're basically telling customers all the time, we don't care what your name is, we need to use a fake name for you because security or because performance. All of these drive me crazy. So one of the questions I get asked all the time, what do you do? Well, you know, you sit on a plane and you're a data modeler and someone asks what do you do? I mean where do you start? Do you say oh, I work with computers, you know, and you try to fill out what they are. But that's not the problem, even that I'm talking about I have rarely ever seen in a drop down list of professions data architect or data modeler. So then you're stuck with gosh, I don't know, should I choose dba, should I choose analyst, should I just choose architect and hope that people know that that's what I'm talking about. I even don't see it in names that in lists that are these detailed and this is a standard list of professions that data modeler or data architect is not listed in, but look how specific it is, set designers and actors and sometimes cheese makers are in this list. You know, we're not in this list. Now, what does that mean? Is it important that data architect get collected or whatever? Not sure why you might be collecting profession. Usually it's done, especially if you work for a technical company or a software vendor or an IT company, it's done because someone is trying to figure out do you have the buying power? Are you the person we should be talking to or should we be talking to someone else? And that's fine for that list. But I also see people using these standard lists that may or may not be updated that are grossly out of date or that don't even have the target professions in their field because they were lazy and wanted to use reference data that someone else put together. So if you ever find a drop down list that isn't from DAMA or from your database user group that has data architect or data modeler in it and I'm not going to count data scientists, then I'd love to hear about it. Because here's one of the things about collecting professions data. Kids today, see now I sound like an old person, kids today will find jobs. Your kids, your grandkids will find jobs that don't even exist yet. There will be industries that don't even exist yet. And that data is going to be a key part of them. And the data profession, so I mean the people who put together this reference data isn't keeping up with that. So maybe if you're collecting what's your job title or profession and doing it not as free form text, you should look at to see if it's up to date with the new world of data. Not just throwing in data science but maybe putting in analytics analyst and developer for analytics and integration specialist and all of those things. Yeah, I love that Barbie disposable crew member. Yes, of course. So what should you do about all of these problems? Well, this is one of the hardest ones. Your data models and your database designs should anticipate international data. This is the hardest thing I have as an architect that I get the most pushback for. Sorry. I don't know why all these auto advances are here. But anticipate international data. You shouldn't assume anything based on other data like the area code issue unless you have confirmed that that is always true. All of your systems should offer data correction processes and processes. You need to spend time as a data architect learning about the outliers and then if you have other recommendations, I would love to hear from you about what they are. So what you must do to love data is fight myths about the data world. So I've only given five examples of myths that people believe about phones and phone numbers, myths that people believe about names and I could probably do a whole presentation on the myths of names and I probably should do that for some event. Myths about professions, myths about documents, all of those things. Your design should support the full data life cycle so deletes updates and inserts. You need to test end to end, not just the unit test. So unit test or test driven development are great but the test should also test all the way through the process including any paper, heaven forbid, data artifacts that are created, whether they're cards or passports or boarding passes or any of that stuff. And they should test for all these edge outlier cases that your developers will hate you that you tested them for. They really should. I want you to memorize this. This is what I use when people complain that the data models I've created about names, contact mechanisms, phone numbers, postal codes, addresses, if you want your database to be simple, go out and make the world simple then come back to me. Again, one of the harder parts of data modeling and design because as I've talked about in other presentations, you know, these highly personal data, names, addresses, contact mechanisms, gender, sex, marital status, all of those things, highly personal and probably the worst analyzed and the over constrained of all the data in the world. So we get the transactional stuff because that's sort of a not a concrete concept. But data about people, we can say that, oh, our users don't care that you think that there are more than two genders in the world. Well, yes, your users don't care. And when you say that, you're making a statement to your customers about how much you love them and their data. We don't need to allow special characters and names because it impacts performance. Do your business users know that you're an intentionally spelling or rejecting parts of people's names for performance reasons? They might be okay with that, especially if you bring them hard data on what the difference is. But they may not be aware that we are making those decisions for them. Especially true for people who have all those funky characters and their names. The other type of complexity that happens is I have a reference model that I've done for all the complexities of names and name components that they need to get published somewhere. But if I showed it to you, every developer in DBA would just cringe because it involves about eight or 10 tables just to do first, middle, and last name. Now that doesn't mean I'm saying everyone has to include that. But as long as you're aware of the complexities of names and can help guide the business into choosing the right requirements and what requirements of the real world they'd like to ignore or maybe mitigate, then you can do that. So I have a question, how do you balance the need to allow flexibility for all different options to name address phone numbers against the need for data validation? Well, that really is the heart of what I'm saying. For instance, requiring every customer to have a last name. That's probably going to be how we do things going forward for a long time. Ideally, we might use the trade-off of not making that column null because the chances are we'd end up with more situations where a person actually had a last name, it just wasn't captured, versus people who don't have a last name. But what we could do is use one of these phone nulls, a fake indicator in the last name thing that is strictly managed and not just any old crap, like sometimes spaces, sometimes stars, sometimes nuns, sometimes the world null typed up, to indicate that this person actually has no last name. We could do that. The normalization rule say we would create a different table for people who have no last name. That's certainly one of the options that also leads to more joins that everyone seems to think is impossible. We have these technical ways to deal with it, so you're right, finding the right balance. The last name thing, I like to use it because it's definitely an outlier, but there could be cases where perhaps maybe you're a medical establishment and you don't know the person's last name at all yet and therefore maybe that column in that particular table which might be not just the person table but a particular subtype of it, maybe you don't know that name, maybe use the indicator thing, maybe it's nullable to be filled in later, but then we go to middle name. It's very common, it's much more common for people to not have a middle name or just choosing not to supply it. Making middle name required for cases where that's definitely not true is probably in most cases is a mistake and it's not going to help you validate data, so you really have to go with every single column has to have that cost benefit and risk assessment. So any other questions here about this? Because this is what I don't know. When I call in to a company and they ask me what my name is, I'm not even sure what name I have to give. Usually it's Karen Lopez, sometimes it's data check, sometimes it's kitty, and those are alternate names that I have used and I have provided, so that's my own complexity I'm bringing to the problem. But heaven forbid, if you have one of those weird foreign names that's been transcripted into our alphabet, you don't know how it's spelled in their system because it might have been spelled ten different ways. I don't know what my address is because I don't know if you allowed postal code to have letters in it. I don't know all these things. I don't know what I chose in the dropdown for what I do. I don't know where I live because I don't know if you could use my address as I provided it. I don't know what phone number I gave you for work when you forced me to give one that was different from home. I probably just made one up because I'm not going to know until you tell me because your systems forced me to either give you bad data or forced your customer service rep to provide bad data for me or maybe it got messed up in the integration from one vendor to another. I'm not sure. So you all were great about this. We got fewer questions than I anticipated. One of the things I wanted to talk about is that I have thought that I think that we as a data modeling community should be talking more and more about these outlier issues. So talking about internationalization and globalization and too many organizations especially in the U.S. think that they don't need to know these things because they only do business in the U.S. and yet we have people who have international data even if they were born in the U.S. they have international data. We have people who move to other countries. We have people, vendors who are located in all kinds of places. We have examples of data getting miscaptured and therefore that needs to be corrected. We have cases where our in suppliers aren't allowed to do all this. We should be sharing these stories and about these glitches and perhaps even sharing reference data models for how to deal with it. At least a reference data model that says hey, if you were going to model the real, real world, this is what it would look like. So one of the references that I like to make for this is there is a guy in the Netherlands, Graham Rind, R-H-I-N-D that writes and blogs a lot about internationalization and globalization. His specialty is postal addresses, but he also writes and talks about some of these name issues and other things as well. The other reference I like to give is Lynn Silverston's books on universal data models, touch on some of these topics, and they're a type of reference model. They probably don't go as far as what I would like to see, but they do talk about those things. And finally there's a question, is this being recorded? Yes. And yes, you'll have a link to the recording. You'll get that in the mail as a follow-up next week. And I'm not seeing any other questions. One last call for questions, and I'm not seeing anything on Twitter. No, not seeing anything else on Twitter. Shannon, I think that's all we have. Well, everyone's in a summer coma today. It's quiet. I think so. I'd so love to hear, as I posted in the chat, if you have any examples of your own data, especially for something that I haven't addressed here, I'd love to hear about it. That'll change the way you teach DB design. That's perfect. That's the influence I want. We should start early with people. Because one of the challenges I find as I accredit college programs in the computing world is most of the database design courses teach all the bad design that I just talked about. Because books have to be simple, and slides have to be simple, and that's what they talk about. Is there a sample data model for names that you specified earlier? I'm not aware of one other than the one that I have that I really need to get published. If I do that, I will definitely blog about it at dataversity. I need to go find that and dig that up. And you're struggling with this at the moment, Michael? Yep. I'd say every single organization struggles with this. I have never come to an enterprise data model that even began to address some of these issues. I think that's it, Shannon. Fantastic. Well, Karen, thank you so much as always. And as you mentioned, I will be sending a follow-up email within two business days so for this particular webinar by end of day Monday with links to the slides, links to the recording and any additional information requested. I do believe we have lens books in our data recipe bookstore so you can check them out there that you mentioned. And so I will stop the recording and if you want to continue on, Karen, we can continue the discussion. Yep. Yeah.