 And welcome, my name is Shannon Kemp and I'm the executive editor for DataVersity. We would like to thank you for joining today's DataVersity webinar, Big Challenges in Data Modeling, Data Modeling Design Problems. Sponsored today by CA Technologies, Makers of Irwin, moderated by Karen Lopez. Just a couple of points to get us started. Due to the large number of people that attend these sessions, you will be muted during the webinar. For questions, we will be collected by the Q&A section or if you'd like to tweet, we encourage you to share highlights or questions via Twitter using hashtag BCD Modeling, Big Challenges in Data Modeling. As always, we will send a follow-up email within two business days containing links to the recording of this session and additional information requested throughout the webinar. And I'm very pleased to introduce to you the moderator for the webinar series, Karen Lopez. Karen is a senior project manager and architect at InfoAdvisors. She specializes in the practical application of data management principles and Karen is a frequent speaker, and in Panelists on Professional Data Issues. She's a Microsoft SQL Server MVP specializing in data modeling and database design. She's a visitor to the DAMA International Board and a member of the Advisory Board of Zachman International. I want you to love your data. Joining Karen this month are three esteemed guest panelists, Donna Burbank, VP of Product Marketing at CA Technologies. Donna has more than 15 years of experience in the areas of data management, meta-data management, and enterprise architecture. She's worked with dozens of Fortune 500 companies worldwide in the U.S., Europe, Asia, and Africa and speaks regularly at industry conferences. She is also authored a number of data-centric books, including Data Modeling for the Business and Data Modeling Made Simple with CA Erwin Data Modeling R8. To David, a president of the Essential Strategies Incorporated, David has been the information industry that was called data processing, and he has been producing models to support strategic and requirements planning for more than 25 years. Dave has worked in a variety of industries, including among others, banking, clinical, pharmaceutical research, and all aspects of oil production and processing. He has held various aspects of defining corporate information architecture, identifying requirements, and planning strategies for the implementation of new systems. And Tom built Lead Database Designer at Westfield Insurance. Tom is a Lead Database Designer at Westfield Group, a Super Regional Insurance, Banking, and Financial Services Group of Business, headquartered in Westfield Center, Ohio. Tom has worked 36 years of IT experience with over 24 years in the data management arena. Tom is the president of the CA Technologies Modeling Global User Community. And if you look at the Enterprise Data World Conference next week, you can meet them all in person. So welcome, Karen, and our panelists. And with that, I will turn it over to Karen to get us started. Hello and welcome, Shannon. Yeah, thanks, Shannon. That's great. You're doing a great job here making all these things happen. I'm always in awe. If you're having any sort of technical issues, audio issues, screen issues, or something, feel free to pop those questions over into the Q&A session, and Shannon will help you write out. If you have a question for the panelists, also put those in the Q&A, and we're also trying to keep an eye on Twitter because all of us are very experienced multi-taskers. I assure you, and we're all on Twitter. That's great. I also wanted to thank again CA Technologies for sponsoring this because that's what makes these things happen. And I love if you tweet about these things, even if you're a lurker Twitter, that really helps us, and using that hashtag helps us follow that, BCD modeling, like big challenges in data modeling. And I've already started to put some tweets out there, and so we're a panelist because they're great multi-taskers. Guys? Yeah. So today's topic is about data modeling design problems. And, you know, the title's pretty good. I really want to talk about the challenges we have as data architects and developers and EVAs in addressing some of the more common design issues, patterns that happen as we go about trying to not just do logical data models, but to build models that can actually build solutions. And whether those solutions are homegrown database designs or XM messages or anything like that, I think it's really important that we talk about some of these things that we're doing. So I want to start off, though, with the first question, because this is one of the contentious things that I deal with. And I want to start with Tom first. How much design do data architects do in your experience, both your experience at your company and also what you hear in your urban user group? It's an interesting question. In terms of many you've got probably answered a little different than I would today. Oh, yeah. I can't start with a blank sheet of paper anymore. And I think that's a good thing. And I think like any experience data model or you have your own repertoire of, I would come patterns or so forth. And I also use an industry model in my work. And so in terms of design, make no mistake that there's design involved even with an industry model. It's just that it kind of gets that grunt work out of the way and gets just started. So where's the trust in your organization normally between sort of the modelers and physical modelers? Data architects do physical models, even if, so one of the things I tell people is most of my physical models that I'm creating from scratch, like that blank sheet of paper thing, are known for building databases as much. They might be one-off tables, extensions to packages or something, but most of mine are for integration layers. Staging databases or EFBs or something like that. Yeah, my golf works pretty well in my current position in terms of I do the low model and I do the first cut of the physical model. And it's kind of the thing I enjoy. It's something I like. I sort of like knowing just enough about that database. It's kind of, you know, kind of dance that line where sometimes I step on a DBA tobe. I'm not real often, but I think I put my stand into that design and involve them. I think that's a real critical thing that we're crossing into that logical to the physical layer that people that are logical modeling people need in some of that physical world because earlier you made the comment of models that actually deliver solutions and that was a really good choice of words because, you know, your logical model is going to have to actually deliver a solution. So, David, what do you think? What's been your experience about how much physical design data architects do? Well, in my particular case, I'm pretty much of the physical design. I have pretty much all my work cut out for me to try to get a model that really represents the nature of the business. And as often as not, that gets turned over to other people. The point is that I charge a lot for the kind of modeling I do and they don't want to hire me for that much to do physical design. So they hire people who are much more competitive at that than I am. They have the opportunity to take a model and generate a physical design and get all the long-distance and foreign keys and the constraints and all of the stuff that's been along that. And I say, um, maybe not. And I teach people who are much better at that than I am. The point is correct, but go ahead. I just want to ask you, since you brought up foreign key constraints and maybe value constraints and valid values, what else can I think of? Well, it's not so much that. It's that if I have a couple of levels to each set of subtypes and a couple of relationships to the supertypes, then those get all replicated to the subtype tables. You have to track the naming conventions such that you can figure out what exactly it is that they're pointing to and this gets very tedious very quickly. Basically, it's on me. Right. So the reason I was just pouncing on those terms is one of the things, like Tom talked about doing first-cut database design, which is how I describe the physical designs. I'm certainly not a DBA and I have to work with so many target DBAs that I possibly have enough time to master at a professional level all that knowledge. Right. It's just more of separation of duties and separation of resources, I call it. But one of the questions I run into all the time is how much is enough? So I've worked with data architects who feel that even picking a primary key, a candidate primary key even, or asking at a primary key is way too physical for what a data architect should do, all the way to others who think maybe it'll like stored procedure coding and being able to figure out how to partition the table and making decisions on whether a foreign key should be enforced. The decisions that have to be made as I see it, before you can even automatically generate it, is first of all how to address subtypes. This depends on how they're being used. If they really, everybody is looking at the supertype and that's what all the queries are going to be and there isn't much variation on the subtypes and you do it that way. But if the subtypes are all very specialized and there are lots of different people that are doing that way when you do that, this is not a technical issue in the sense that you're dealing with the nets and bolts of the database, but you're saying, okay, there are usage issues that have to be addressed in the design. And I use a lot of computed attributes, which I discovered a long time ago, is a really profound way to be able to describe what's going on in the model. And I learned the first time I had a software that did that that well, it's really nice if you have a software that automatically, you put a formula in there instead of the value and you ask, do a query calculator, but you have a lot of those and they're all dynamic and you ask a question, suddenly the light's dim. So there is a design decision, there's a design decision, does that calculation happen when you're bringing the data in or is it something you can do dynamically when you do queries? And it's those kinds of trade-offs that are important to design. I think too, and I think a lot of data architects don't understand all of the physical trade-offs, like we design beautiful logical models and first-class physical models that make DB just want to run for the eye bleach. So Donna, what's been your experience with how much physical design data architects do? I would say, I do see things all over the map and I think depending on the size of the company and I think changing roles, I'm seeing more and more people kind of wear a lot of hats. But typically the data architect is at the business in definitely the logical layer. I think ideally it would work in the way that Tom described that maybe there's some light design, but I definitely think the magic is it's almost two separate camps and I think in some ways that's a risk that you kind of hit on it and maybe create a little controversy on the call. Is it too much to get this perfect business design that then in the extreme cases, the DB goes, yeah, whatever, that's too hard. I'm going to throw it all in one table because that's easier to build one table rather than seven in all that a way. So where's that just enough? And I think a lot of ways we've matured as an industry, I like Tom's comment of, I think a lot of these patterns, I know David has done a lot of work on patterns. We've kind of got some of that right. But I think one of the things I can criticize us on is maybe we spend too much on analysis paralysis and therefore it's not useful. So how can you do that just as well? I believe you can have that conversation. So David, did you have a comment? I think that process of translating between two is where it's the heart of it. And what you have to do is to get the business person or the business analyst together. Actually, there's some business people involved too. The modeler on the one hand and the designer on the other hand, and there are questions that you ask the business people, how are you going to use this? And very specific translations you can make if they're going to use it this way as opposed to if they're going to use it that way. And you can just sort of go down the list and say, well, all right, how do we do this? And it's the heart of it that will determine the answer to that question. And there's a process in the middle of doing that translation. Yeah, often business, I think we're used to having business people involved in the logical design. I think it's less. I think it's equally important, I agree. But I don't know if we always do have that, how is it going to be used discussion? Tom, do you do that in your company with business people? This is going to be used from the developer DVA perspective or the business perspective? That depends on the business perspective. You have a business perspective. We get that for sure. At the same time, one of the faults we have as logical data modelers is we don't involve the DVA upfront. I've also seen DVAs say, oh, I don't necessarily want to be involved in all these meetings, but they still want to be involved. It's an interesting dilemma. I might add to the comment earlier. We have a situation where if you start moving over into that physical world, I have to watch myself that my logical designs don't become too physicalized. Actually, those are all great points. I have a blog post on Dataversity about the Mason-Dixon line of the Zachman Frank about trying to draw people. There's a line there that I see, a horizontal line, where people are the most comfortable working. There are some really wonderful people who can work in the data column all the way up and down or at a row all the way across. But for the most part, most people either like to think about building stuff or they like to think about solving business problems and to find that right trade-off along that Mason-Dixon line. For those of you who might not know, Mason-Dixon line is a cultural divide in the West to describe different belief systems and everything. It's a point of view as we've got people up here laughing and people down there below the line laughing. I think it's like pulling people across those lines is good to get people to stretch themselves and understand. I've been in meetings where it is kind of embarrassing if data architects can't look at an actual, let's say a Create Table statement and read it. Again, there's a certain amount of disowned literacy that we data architects need to look at BL and be able to understand it, maybe not choose the right way to do all that stuff, but to be able to read it and participate and participate in that discussion. So now that we've spent a few time talking about the nature of some of the issues, the point of view, I want to switch into just a few examples of some of the more common design issues that I deal with. One of them that comes up a lot for me is already a data architect designed in that model something very beautiful and a logical model, like a recurred relationship for a hierarchy and represent a ragged hierarchy. A ragged hierarchy is one, if you think of an organ chart where it has multiple levels, but not all the same levels. So you might have the CEO having some other C-level people report to them and then you might have some VP, some senior VP's reporting them. But in one of those days, maybe the VP has no more people reporting them. Let's call him the VP of Special Projects. The VP of Special Projects has no people reporting to him. But then you go on and you have VP's that have HR, finance, IT reporting them, and they might go down another 50 or 60 levels. But eventually they all stop, but it's called a ragged hierarchy because it's not just a pyramid-like hierarchy. And in a logical model, we can design those and have this nice recursive relationship pointing back to its parent. But in a real physical world, that's really difficult. You have just touched on one of my biggest griefs about the dear old sequel. I spent many years in manufacturing systems, and the product structure was the guts of any manufacturing system. And a couple of programmers figured out how to deal with it, which was fine. They used a product in the early 80s that had manufacturing system functions. And one of them was, how to explode the product structure. It takes care of exactly what you said. And it goes to each wing as far as it goes and it quits, and then it goes to the next wing and goes as far as it goes and quits. And it was wonderful. And that was exactly the way the world should work. And I must confess that getting bad habits was a data modeler because I assumed that you can do that. And I started to work at Oracle, and I discovered that Oracle ostensibly has a command for that. I can't remember what it was, but it lets you do that kind of an explosion. Oh, by the way, you cannot actually join discussions with structures. So if you're willing to have a product structure that has a lot of part numbers, you're great. But if you'd like to have part numbers with the description, you can't do that. So one of the issues, David, is that with these right, so you're right, exploding them, also expensive, everything, but the real issue is where you need to insert a level. So in the org chart example, if someone decides that you have divisions reporting to, departments reporting to divisions, and now a sudden someone came along and decided we need zones in the middle there. But the cost of now updating all those recursive relationships can also happen. And so, yeah. Well, so there are all these tricks for how to physically implement one of these hierarchies. And one of the great resources for this is Joe Selco has a book just on people, and it's called Joe Selco's Treating and Hierarchies. I do that all the time because it has all these ways of dealing with not just modeling a hierarchy, but modeling a hierarchy and building a solution that will be used the way the hierarchy has a tendency to be updated. Right, exactly. So if you're going to have lots of inserts in the middle, do it this way. If you're going to be just updating the values, do it that way. The problem is that the tools are not as good as they should be, is my conviction. Right. I'm not going to blame the tools. You know, your point is correct, but you're sitting here saying, I know it can be done. And it's a pity that I had the experience that it actually can be done, and it can be done well, but just not in SQL. Interestingly enough, I'm dropping the name here, I once had lunch with a doctor. And I groused about that. Why did SQL not have a function that lets you do a recursion like that? And he said, well, actually it did. When he designed it, there was an argument you could put on a SQL statement that was, okay, just explode this on as far down as it goes. About 30 minutes with the system thought, nah, that's too clever or too sophisticated or something. And they just didn't include it in the product, and it's never been there ever since. There are some commercial RDBMSs that have hierarchy features. I'll just call them that. It'll be very vague. Okay. So I'm starting, retrieving, excluding them and everything, and I don't have a lot of experience with them, but this comes back to the point of, should I make sure that I understand the difference between just, like, to a business person, I can describe a hierarchy in a data model. I mean, very easily, very commonly. Yeah. Yeah. When it comes to implementing it, that might not be the solution. Exactly so. And this is designers earn their keep. I respect for that. Anna, Tom, you have anything to add to this? Well, I've definitely experienced that problem firsthand. And as you say, the first time some realignment comes then on the pay, right? Because of that realignment. And, you know, I've found an ideal solution for that, other than flattening the data, you know, some, and sort of like that. Well, if you have the model say it's an organization and you have the structure of the thing that can, you know, run around through it for as many levels as you want. If you insert something in there that's not a big deal to that model, but depending on what kind of trick you use to implement it, then you're right. It's maybe an issue. Yeah. I guess that's, of course, the importance of that logical model because technology changes and we don't want to, sometimes the negative of implementing for a specific physical solution is that it masks the business need. So this is the need. This is how we have to do it now. Hopefully if something comes along here, we can implement it better. That's fine, but let's not mix that with, that's not how the business wanted it. Exactly. Good point. Now, I want to add something to another, one of my other things that I run into is one of my hardest-designed decisions. It's actually a logical modeling decision, too, but where it really becomes important is in the physical side. Is this tough between highly generalized models? Very, very specific models. And with that, I mean, do we have something like, a specific model might have an entity or table for invoice, shipping notice, order, receipt document, veteran, or you might implement a generic concept of, like in one model I work with, it's called an inventory control document, which could be subtyped into those things. But you might generalize it because all of those things have a high number of, not just attributes and values in common, but also very similar workflow requirements, like they're used very much the same. So it would be a generalized implementation if you have inventory control document versus a separate interior table for everything else. But what's your experience of helping people try to decide how generalized they want to go, or what are some of the issues with going generalized versus specific? You know, I'm a big fan of the conceptual data model, and one of the folks in the questions they had talked about re-use, and I think to me that key is to do have that, whatever you call it, having that generalized layer. It really, to me, is a separate model in the layers. There's this thing called invoice, and that has a definition, and there's attributes about that that are the same that might be re-used in other models. And even at the logical level, you may instantiate that with things like attributes that those can be used and might be taken off and made into separate tables in different physical databases, but the core of invoiceness, or whatever example I use, that should be the same. Everyone has the idea of this design layer architecture where you can literally have these separate models that derive into others. So you can have this central logical, for example, and just take that differently on Oracle, SQL Server, Paradata, but then you don't lose that mapping. So it's kind of the best of both worlds. Yes, these are the common attributes. I implemented it differently on DB2 but it's still the same thing. Oh, you brought up an even different issue, which is actually also some of the true data modeling design issue is how do you support differences between a logical expression of your data model, your logical model, and all the several physical things, also really important design consideration. But I can't tell you the fact that what I was really talking about is in one particular physical model how do you decide whether to take a generalized, more abstract approach to design something or sticking with the specific tools. And I think over the years, we've definitely checked. When I first started modeling, the trend was very specific. You know, you had, even if you had subtypes, you had lots of subtypes, you had very specific things for the generalized approach. So in my example, inventory control document, you might have a document type, and the document types would be invas, shipping notice, receiving document, those things. But we wouldn't necessarily have separate tables for them. Even, and so generalized approach, what allows you to do, if someone comes up with a new document type, we don't have to change our database, and in theory, we don't have to change our code, although you might have to deal with workflows. So I know what I was talking about. So I totally answered the wrong question. That's okay. You answered a great question. I should go into politics. I just created my own question. Sorry. You answered my question. Well, I'll answer that one then, if I get a second try. I think it's a balance, like anything, and we've learned a bit. I think preference is a bit of generalization, because my frustration is to be in a database where there's a separate table for, I can't remember your example, but, you know, employed with red hair and employed with green hair. You know, you can't have all of these, but I think we can also go too far and so that will help facilitate the model, and it's too general a party, might be one where people argue back and forth. Is that generic? You don't know what it is anymore. So I know that's a general answer, but I think it depends on each case. I think in general, it's better to generalize, so you don't have wasted tables. Just don't take it too far, so you have a thing, a table that can, you know, everything. Exactly. I think it's about a good choice of words, Donna and David. I mean, what I mean by that is I favor generalizing if it makes sense. In other words, everybody, I think, understands what party is, and you see that generalized quite a bit, right? But sometimes I've seen, like, all types of contracts or policies and sort of whatever being called an agreement. I'm not sure they understand that, and I think the reason why it's hard to choose the right words on that physical side is because things have changed over the years, and one of the things that's changed is, we have a lot of end-user business access into these physical databases. Now, they can be masked in a view, certainly, but who wants to constantly rename? You'd sort of like to be able to find an object through, you know, it's passed, you know, through the database and into the reporting or into the query, whatever. So I like to use generalization. Sometimes you get into some alright issues, you know, as you need to start enforcing reference integrity if your housing is all in a tight entity, then that becomes an issue, too. Yeah, you've really done something very important with this trade-off because I've seen more discussion, like I said, like you even brought up party. So when I first started doing my data modeling contentious issues presentation, gosh, more than 15 years ago, and party was one of the contentious issues at the time. Virtually no one was implementing a couple of big companies that were doing it, most insurance companies. But to give that presentation where people vote on the value of, you know, how they feel about something contentious, I mean, it has completely flipped. The majority of people see the value in it. They're not always able to implement it because the business hasn't bought into it. But that just shows how our approaches to data modeling and design have changed over a decade. I think that's an important thing. The other is about enforcement. So when we have really specific tables and columns and everything, we can use the DBMS to enforce all of these constraints. But when you start generalizing things, now it's all up to the application code to measure that document, a thing like that. And I think that's the trade-off. The other trade-off is while people love them, because it allows them to have agile data, not agile data modeling, but agile data. We can just update data and support new business requirements. But most developer tools still want to do all their work by dragging a table, dragging a column, and they don't have a feature for saying, when the value of this column is a, you know, we'll do this. They have to handwrite all that. And I think that's the trade-off. I have this problem before we even get to design, and that is how abstract or how concrete to make the model in the first place. Because the process in getting there is you do what I'm calling the semantic models, which are the models very much in the user's language, which winds up with lots and lots and lots of entity types because everybody has its own set of things that he's interested in. And the exercise is to bring those together and then come up with some kind of abstraction that encompasses all of those things. And bringing those people together to at least recognize this abstraction is not easy. And so a lot of these conversations that you have between the model and the programmer is the same problem between the abstract model and the user community. And it's important to remember the conversations because then when you go to the physical design, it's probably very appropriate to be much more specific to the community. But in doing that, be sure that you're making the right translations. But the underlying issues are the same. If you want an enterprise-wide system that serves a lot of people, then each of the individuals has to sort of accommodate, okay, how do I use the mortal language? And similarly, if you want a system that is more general purpose, then any individual functions have to do that. Now, in terms of the rules, we found in modeling, yes, it's pretty likely to do a lot of parameter tables or parameters, and different kinds of, where an attribute is specific to a particular subtype, a specific parameter can go to anything. And so I have an additional entity that is the rule. So this particular kind of thing here can only work for this subtype of, say, party. Only some of these can go to a particular party type and so on, and that actually works pretty well. It's a little tricky to explain, but it does the trick. Right, so good point. This is related also. We have a question in the Q&A from Eric about how about a generalization to the point of key value pairs. Definitely dealt with designs like that. As a matter of fact, I've been brought in to fix them. I use key value pairs quite a bit for just like you're talking about parameters of something where, you know, especially on products in a retail model or something like that where, you know, the sort of descriptive information we have about a product, I mean it so varies by the type of product and then there's new ways of dealing with these. I mean, you know, the values that describe a TV just seem to change every five minutes. So if we tried to model those with a, you know, a table of, you know, it's 3D technology type, we'd just be constantly turning that database to the point we wouldn't even be putting data in it, we'd just be updating it all the time. So, but what do you think? Tom, have you dealt with any designs that have key value pairs? You say, here's the end, here's the value. You know, you know, an awful lot. If I were to implement my industry model as presented, it's not everything is typed. So you'd have to use that type in addition to another sort of natural key and it makes it difficult. It really does. So it's a design that I like to avoid. I know many times when my models become physicalized, they may see some changes in that particular area. Donna? I've seen the need. I think there's not a great answer right now with the relational model. I know where our team is doing some research with, you know, some universities of how you really create this kind of, you know, I think it's not just for the key value pairs but other technologies too. Is there this common data model that's valid for XML or big data cell is one of the questions which is often key value pairs? That's kind of a side point. We talked about a logical data model is still very relational. Is there a logical data model that's non-relational that then we can still reuse those pieces? I just don't think we're there yet. There's an industry. I've seen some prototypes but I don't think any tool has gotten that quite right yet. So it kind of makes strange works to make it kind of relational. But as you guys know, that's not the right thing. So we're having a great right now. So that's a great segue into one of the questions I had which is, you know, I still think of a logical data model of, I mean, I've used it to implement relational databases, XML schemas, hierarchical. They're not relational at all. You know, kinds of data stores and messages and everything. And so in my mind, I think that I can use an ERD which you said is based on relational thought. I would still take those thoughts and implement them in a wide variety of target data stores or other types of data structures. And so in the back of my mind, I'm still thinking that capturing requirements and thinking about, you know, things like, what are the data types and what are the constraints and all of those things for that data. And I still think about putting it into non-relational data stores, but that's been one of the big criticisms of data architecture, data management, data modeling coming from the NoSQL community is, you know, all of our models are relational. They can't possibly be used. We must start from scratch. Our data architects are supposed to respond to this issue of we think relationally or this action that we can only think relationally and that none of all that great, all those being locked in meeting rooms for hours and days now needs to be thrown out because we're implementing in Hadoop or something else like that. And we want to tackle that. It's very hard not to think relationally. In the sense that I specifically am allergic to foreign keys appearing on my data models for that reason because it's a foreign key in the relational implementation of the relationship. If a model consists of representations of things of significance and there are relationships with each other where the relationships are expressed as sentences, you know, subject, etiquette, object, you know, official language, this is the world that they're in. And if it gets implemented in XML or it gets implemented in Hadoop or something else, the underlying definition or description of the business should not be affected. And a complaint about information engineering is that it bars the whole of a really complicated way of representing identifying relationships. It's a pity because this relationship can be marked as identifying just with a little tick or something like that and that's all you need. We focus on what are the things and how are they related to each other. They're not the priorities of how they're going to get implemented. And I've always tried very hard in the modeling part to be far away from even relational technology as possible. Having said that, of course, modeling itself is, you know, made by or came into existence thanks to the relational theory and that's certainly hiding back there. It's certainly relational technology and I think we should stay away from in our architectural models as much as possible. The logical model, I agree, tends to be sort of relationally biased. I think it's true about the relationship though is what I think. You know, when you're looking at sitting there with a business and trying to solve a problem, it's all about the relationship in the data. It's what guides you to collect that piece of data and almost any enterprise today, you know, you go in and everybody wants to collect, you know, tell you an attribute about a customer but somewhere something needs to tell you what you really need and as you get into this world of big data, I think those relationships that exist, the business discussions are very critical and it may manifest itself in a relational model but people still need to understand relationships and know what technology is. Relationships, absolutely, I agree. But to me, relationships are assertions about the nature of the business. You know, any kind of big data or, you know, next generation of data management and so on has to ultimately come back to that. You brought up an interesting point and used the data information engineering doing this. So, of course, in the original sort of information engineering notation, there weren't any keys in the original tools anyway. Right, no. You know, key columns didn't show up in the old tables, right? You actually just saw the relationship listed. You see, I saw a list of owned attributes and then you saw a list of relationships that came into that table. And it was the advent of IDF-1X, an ongoing notation which requires showing of foreign keys. And now it's the tool vendors, and I think pretty much all the tool vendors that support IDF-1X and what they call IE, which is just a different flavor of IDF-1X, a different notation, but still IDF-1X, you can still turn off the showing of foreign keys on child tables. Even there, because I use my model show identifiers, the notion that this relationship is an identifying relationship is very important, but you just put a tick on the relationship and that's all you should have to do. Yes, the relationship, depending on which notation you choose, is drawn differently, depending on whether it's identifying or not. So it all comes down to how much flexibility the tool vendors allow us in what we want to show in our model and how we want to show it. This is also sort of the influence of standards. So IDF-1X is a standard. Vendors want to be compliant with the standard. They follow the standard. Whereas some of these other notations weren't standards, they were just notations that could adopt. So in those other notations, there could have a little bit more flexibility in how we wanted to depict those. Donna, do you want to jump into this? I will. Speaking of the vendor, there's the robe. If you say you support a standard, you do support the standard, and that's how it is. Exactly. I think the industry is evolving and I think the good news is people are wanting to show a tool model to a business user. In fact, some of the Q&A was on that as well. So what we've done in the tool is very much how you would experience it. At its crux, underneath the scenes, there's going to be that foreign key. You don't have to show it. I mean, you can show anyone model it. There are only pictures, icons, and no lines at all. It could look exactly like a powerpoint. So we've got a little flexibility. I know, sorry, but I could just say we're in the data diversity circle. I want to point to the big challenge in data modeling survey, which keeps me up at night, because when people ask, what's your most used data modeling tool? So we're in this number one. We were pleased with that. And video is number two. The reason is because people want that just enough. I want to show something quick. Well, it's probably free on their desktop, but that is more flexible. But then the negative is you can't generate a database at the end of the day. So that was one of the struggles we did of how do you add that Visio or powerpoint like layer onto the real engine. And so that's not 100%, but we took a lot of steps. And there's kind of an unknown. A lot of folks aren't using it as well as much, because it is newer. But Billy, you could. You could hide everything and make it look blue and purple and have little pictures of people if you wanted. That would be more accessible. Actually, my request is a little different from that. I don't dispute with the point you're making. I grew up on the Oracle Notation. Richard Barker and Larry Altman. And this is actually a very disciplined notation. And it's not so much the notation itself, because the notation is very simple. But it has a disciplined language around it. And it's very amusing that the new hot buzzword is semantics and semantic technologies. And RDF is exactly the use of language for some years now in the data models. And all you need is a line that has a solid piece of data for optional and proceed or not. They say, I like to show the identifier. So I put a character next to the attribute and I put a little tick mark on top of the relationship. And that's enough relationships to show identifiers. And then we're done. I find it ironic, by the way, when I went and tried to do this all in UML that actually if you use stereotypes for identifiers, which it had to do because UML didn't know about identifiers, the stereotype next to the relationship or put a little stereotype next to the attribute, that does the trick very nicely. And you don't have, it doesn't change the shape of the relationship of the box and any box. It doesn't change the kind of line in your drawing. It's just a little extra piece of information, which is very nice. Great discussion about these things. And we're getting to our last 15 minutes. I want to make sure that we have time to answer some of the good questions we have in the Q&A and some in the chat that I spotted. But I wanted to bring up one topic. One of the things that I struggle with as a data architect as opposed to some of the other roles is finding out how other people have solved these problems. It just seems, and I know some of you, Pat and I, have talked about this before, it seems like the data management, data modeling community doesn't do a lot of sharing of some of their questions and solutions. So the rest of the world seems to blog a lot about how to solve a specific problem. So not blogging, like a lot of my blogging is done recently about the theory or love data or any of those things. But actually sharing, you know, here's what we tried to do, here's what we did. And yet all the other communities, the DBA community, the developer community seems to do a lot of blogging and everything. But people are contacting me all the time and I can't find any blogs on how to do this and how to solve this problem. Why is it, do you think that we're sharing some of our design problems and solutions or do you disagree? Anybody? I have an opinion, not politically correct, Pavley. I tend to think that we're a community of, like older folks. I mean, I like the old cronies, I like the old cronies, but I think you look at some developers and they're open, and it's just sort of like the kids today with texting and Twitter and stuff. You know, they're willing to do that. But, you know, when I get in a group of people to try to convince them to even read my blog or to respond to something on a message board, the answers I'm always, I don't have time to do that. That's the answer. If you had time to do that, you would probably solve your problem faster if you wanted to do that. I think it's a cultural thing right now. I'll jump in and be politically incorrect too, but I'll start by saying I don't think we're as bad, I think we're a little hard on ourselves because look, there's over 140 people talking about challenges in data modeling today. So clearly, there are people. But a part of it, and I can tease data architects because I am one, and I think there's a science that's in, I know, not that, it's sort of in to discuss and say I disagree with your theory and go back and forth. I think a lot of us don't want to put that out there because then we'll be seeing this wrong. So I don't know, I think that's part of it too. We have to have this perfect model and everything. If we show the date out there, does that make us look weaker? I don't know. An interesting seminar with Len Silverstone that did a whole thing on character types. And that actually came up from many people in the audience. He's like, yeah, I'll send in what I am, and I have a typo. I have to send another one to correct it. Why do I do that? Yeah, I do that too. But I don't know. I think that's a piece of it. Right. That might make us great data architects, right? The detail-oriented needing to get it right. And so I have heard the age thing, but I'm very experienced. We're all pretty experienced. But I work with enough experienced DBAs, experienced developers, and by what I mean old. Then I'm not quite sure that it's that. And I do think it's more of. One of the things I'm going to be doing at EDW next week is giving a short session on how to get started in blogging and some of this stuff. So I'm hoping to maybe encourage some more people to share some of these things. So good to get on that. So I'm looking at some of the questions in the queue. And one of them from Carol, please compare and contrast the use of UML versus ERD for modeling. David, I bet you you have a session coming up someplace. Maybe at EDW. I do. Is it at EDW? It's at EW. It's going to be too late after one o'clock. And then everybody has to come there. And while you're at it, I have a book on the same subject, and I will be automatically wrapping it that same day. And it's going to take a couple weeks. So you're actually Carol, aren't you? No, Carol. No, I didn't. I didn't plan that. I didn't plan that. Okay. Another question is, can you see data modeling involving to accommodate the challenges of big data? So I kind of addressed that. That's from Rob. I kind of addressed that when I talked about new SQL. Does anyone have a quick thought on that? I'm very curious. I'm in the hopes of big data. Well, big data is a new technology. And there's all kinds of stuff that I don't know anything about. I'll take, you know, you can use it as a starting point. At the very least, it had better be describing the business that we're trying to describe all the way along. And in the hope that at least the central models that I come up with are still going to be applicable in a whole new technology for organizing the data. Donna? This is something we've been looking a lot at here at TA. In some cases, it's two different camps. Unfortunately, in an organization, because it's really kind of turning what we do on its head instead of top down, down, and then build. It's really bought to discover and then fight it. But there's still patterns there. But we have seen some success with teams working fairly well together. We have a success story on our site with ARCH. I know you're familiar with them, Karen. Associates of Retail Standards. And they have an interesting method, and we've seen some others do it as well. But you really use the big data as a source. Really, it's another source of your data warehouse. You do some discovery. You do some filtering. Clearly, you don't just dump everything from Twitter into your data warehouse. But are there patterns you've discovered on Twitter about your individual organization? You were talking with younger folks about Twittering, but what they buy? Can you find patterns or maybe data time of day of when people purchase? Maybe that makes sense, and we've never tracked that before. That particular one was a nice way of kind of having that old and new. Because I agree with David, you do need some companies just struggling with, now I have this, how do I make sense of it? We're not 100% there with the tool. There's things like Hive that kind of have a sequel layer on top of it. It's something I'm seeing folks have some success with. We're not 100% there with the data-modeling technology around it, because as I said, it's kind of a whole different thing. But there are companies kind of melding the two to some success. Yeah, that's very interesting. Because when I think of big data, I mean, there really are two use cases. People who are consuming big data, like their reading center data, like in retail video analytics and posting patterns, and picking it up with credit card purchases and what you said on Twitter and everything. And then there are people who are using big data to kind of publish all that stuff. And I shouldn't say big data, like no sequel technology, so related technologies to publish things, like to produce their product catalog and create stores for consuming of data for other people to consume. And I think that really changes where the data model fits in, because if you're wanting to consume external data that just has large volume, variety, velocity, all of those Vs, then you're trying to take that data in and somehow make use of it within your structured data, within your traditional RDB masses. You're going to about how you model that data that's being stored. But if you're publishing out data to Hadoop so that other people can consume it, then you're going to think about it in a different way. And I think that's one of the things that a lot of people, you know, just think of big data as wanting to use it. Interesting. So that was a great question here. What happens when the business users are used to accessing a physical world and don't care about the logical model? That's how many of you guys have it. That actually is my biggest problem. For example, the whole notion of how you do requirements analysis, you go interview people and you say, what do you need? And they say, well, I need this system to work better. So what is your business need? Well, I said, I don't know. What do you got? All they know is the current physical system that has limited their ability to think about the overall set of problems that are up against them. The evidence of that in the long run, Karen, is that the folks get out there and somebody gave, oh, somebody in the department, oh my gosh, I found out how I can use access and connect here. I'll take care of that stuff at IT. We'll take care of this for you. But what eventually happens is, they don't understand the data or the results. They go, gosh, this line doesn't balance to the line you're telling me over here. And so guess what? They come back and they ask, do you like have something that shows you what this data kind of relates and they come back and they ask for the model? So, you know. Good point. Many continue that exploration on their own. I do tell my teams that, you know, no one's going to love the logical model as much as the logical modeler does. Because I tell my teams that they should... Oh, that's horrible. ...to anybody who builds something. And, you know, but they still need to respect it, that it has a role, and then it got them their physical model and got them their DDL and their XML messages and their canonical model and whatever else. And then it has an important role there in that they'll see that importance as we build more and more physical models off that logical model. And I have a story on the end user who, you know, we finished one project, worked on it, started a new project. And when the light bulb went off over his head, when he realized that we had 80% of the work already done because we had done a logical model, and weren't relying on a physical model that we could take the logical model now and reuse a big chunk of it to solve this next business problem, the light bulb went off in his head and he says, I'm never going to be on a project again where we don't use these things. So, you know, he became our greatest cheerleader because he could see the value in that and think that, you know, it's really tough a lot of times I'm brought in a project usually to solve some database design problem and virtually no one wants me to do the logical model until they start seeing some of the value of it's usually more accessible to the end users. They see how the end user is like reading it and how the end users hate reading the physical model. And that's really the presentation style thing mostly. They do start seeing the value of it. And that's the type of stories I want to share with other data architects about those differences. I'm looking through the chat. Guys, you guys are chatty. That's really good. I love that. I did chime in, by the way, I actually pointed out about the blogs. Strong recommendation to go to www.tdan.com because there's a whole lot of things, there are articles there and which means that there are things that somebody puts some thought into and they cover the whole field. And this is where I recommend everybody. Excellent. And both Tom and Donna blog at www.erwin.com as well. Is that correct? Tom does. And he's a community for me. Yes. Yes, that's from www.erwin.com. Yes. And I have my own blog and they do have a website blog. I just like to see more people sharing their experiences or even sharing their questions. We have all these great questions and discussions going on in chat and we'll be able to save that chat but no one will be able to see that and people can listen to the recording that's being made of this session and everything. But one of the great things about writing this stuff down and data architects, we should be fans of writing stuff down. That's what we do. You know, they have these big index machines out there and the interwebs that will go and help people find them. So trying to find the audio from this, I mean that technology exists. It's going to be different. A lot of people in the chat brought up the fact that yes, there are LinkedIn groups. For model vendors have groups and then there's a data architecture group and there's a group that I run, a mailing list for those of you more experienced, a DM discuss. There's all these places where people can have discussions but the great things about being able to write things, whether it's articles or blogs, you know, you can include graphics, you can include videos, you can include written media. It can also be indexed. It all has metadata. You know, it can be served up to people in lots of different ways. I think that's great. And Dataversity as well. So I blog at Dataversity.net, which is the host of these webinars. There's lots of great bloggers there and articles and videos and all these great resources. But I'd also like to hear, you know, people who work in the trenches who are doing these things. And also, one of the biggest complaints I get is that there aren't enough how to do something in a tool stuff. That's what we're actually the weakest in sharing that information. So years ago, what was popular were the forums, like the forums, the online forums that I hosted, InfoAdvisors, but the participation of that has really dropped off. I'd like to see people sharing those things. As we get down into the last couple of minutes, John and Tom, what are you guys doing at Enterprise Data World? I'm facilitating the Irwin SIG on Tuesday morning at 7.30. So any CA Irwin users are welcome to come and meet me. And a lot of CA Irwin folks will be there and probably an awful lot of community members. So I'm happy to. I'll be at Tom's SIG or I'll get in trouble even though it's first thing in the morning. Fine, it's the hour today. I'll be doing a presentation on Tuesday as well. In the afternoon. We'll be making a big announcement that I can't see anything of yet. So stay tuned to check the press releases on Monday. We'll have a new partner we'll be talking about. So that will be kind of the big news for us next week. Excellent. And Dave, you said you're doing the UML thing. You're doing anything else? I'm playing it by myself. Excellent. I'll get a tweet or Karen will get mad at us if we don't tweet. Yes, we shall be tweeting there. And the next tag for that is EDW13. And I'm doing a workshop on Thursday afternoon on advanced data modeling on keeping yourself happier and your team members happier and adding value and making sure you maintain your relevance in your company. So that's what I'm doing. I'm doing the blogging thing. I'm also moderating the lightning talks because as the two-time champion of the lightning talks, I've been to not do it anymore. So now I'm going to moderate those. I forget what day that is. I think it might even be Monday, but that's a great kickoff. It's in the evening. I believe there are beverages and people get to talk for five minutes and do their lightning talks. And I think that's pretty much right. Other than that, I'll be at least talking to a lot of people as well. So I'm excited about that. And we get to the top of the hour, so it's time to wrap this up. Now, some of us, a panelist, I'd like to invite you, some of us stay on for a little bit so that we can participate in some of that. But the winner part of this thing will be gone. We'll be chatting a little bit. The recording will be turned off. That's the best part. And I wanted to thank Shannon for being our great moderator and editor for this and keeping us in line and getting us all here. I wanted to thank CA Technologies for sponsoring the webinar. I think that's really great. That shows a great support for the community. I always love to see vendors supporting the community. And I think all of our attendees, because I consider you guys the fourth panelist here, especially with all your great chat and questions. So thank you so much. Shannon, you have something to wrap this up. Okay. I think you just said it all very well. You know, thank you to everyone. And I really look forward to seeing everyone at EDW. But that's my big bye for EDW is meeting everybody I've worked with in these webinars. Meeting them in person. So of course, thank you to CA Technologies for sponsoring it. Again, you know, we just couldn't do it. Producing these free webinars for you guys without their support. And yeah, thank you so much, everyone. Another great discussion, Karen. And thank you, Pamela, Donna and David and Tom, thank you so much for a very energizing discussion on data modeling today. Okay. And thank you. Thank you. So I'll turn off the recording, let you guys check if you want to chat away.