 Hello and welcome. My name is Shannon Kemp and I'm the Chief Digital Manager for Data Diversity. We want to thank you for joining the latest in the monthly webinar series, Architecture Strategies with Donna Burbank. Today, Donna will discuss data modeling best practices, business and technical approaches sponsored today by CouchBase. Just a couple of points to get us started. Due to the large number of people that attend these sessions, you will be muted during the webinar. For questions, we will be collecting them via the Q&A section. Or if you like to tweet, we encourage you to share insights or questions via Twitter using hashtag DAstrategies. And we very much encourage you to chat with us and with each other throughout the webinar to do so. Just click the chat icon in the bottom middle of your screen to activate that feature. And if you'd like to continue the conversation after the webinar or follow Donna further, you may do so at community.dativersity.net. As always, we will send a follow-up email within two business days, containing links to the recording of this session and additional information requested throughout the webinar. Now let me turn it over to Andrew Oliver from CouchBase for a word from our sponsor. Andrew, hello and welcome. Thanks, Shannon. So CouchBase is pleased to be sponsoring today's webinar on data modeling. While some people prematurely declare data modeling dead because of NoSQL databases like CouchBase, we at CouchBase think that data modeling is actually more important than ever due to trends like NoSQL and cloud computing. CouchBase is a leading cloud native database and document database and key value store adopted by the likes of Amadeus, American Express, and Tesco. Database development was more or less static for 30 years after the widespread commercialization of the relational database in the 80s. However, an increase in both the amount and different types of data people are generating created a need for a new type of database. Not only is more flexibility needed today, but much higher and lower latency than can be accomplished with traditional vertical architecture databases. Think about that flexibility for a moment. In the 90s and early 2000s, we had a lot more payment and communication types emerge. Anyone working on an RVMS back then remembers the schema changes when people started having a multitude of communication methods from way more phone numbers to email addresses to chat clients and home pages. Now imagine those consumer devices and even embedded devices. There are more and more every day. There are also different types of metadata and associated data like sensor reads and attributes. It simply isn't possible to create a relatively static table structure that captures all the fields and field changes efficiently. We have to adapt both our database and our modeling techniques to this new world. Now I get kind of a consumer and manufacturing sort of example, but you're seeing this, I'm sure all of you are seeing this throughout enterprises and software and have you where there's more and more different types of things and attributes than you could capture in that more traditional static table structure. Yet we can't give up. The pattern of the JavaScript developer with no modeling and architecture experience who just haphazardly creates an object model and persists in the data store doesn't work so well in a large scale production or enterprise application. We're dealing with some of the negative effects of that with a lot of the software that's out there today. No SQL database like Couchspace add and remove concepts from our data modeling lexicon. The conceptual model is a lot closer to the physical model. We no longer have the requirement for attribute tables just because we don't know how many phone numbers somebody may have. We gain the ability to map high volume simplicity data to key value stores. Just a thing with a key. Just a warning that while key value stores aren't Couchspace specific having both the document store and a key value in the same place is Couchspace specific. Okay, everyone can hear okay. Yep, you're fine. Okay, cool. No SQL databases add concepts like buckets but we do away with tables. There are collections and items and events for instance instead of triggers. Not everything is new but some things are a bit different like the way indexes work. Also when you're mapping an entity in a document database like Couchspace you have the option to either reference the dependent object or embed it. As usual you just have to think through things like the transaction or life cycle of the object. Meanwhile as no SQL databases have matured some things are getting less different. It used to be in a document database for instance you had to put everything you wanted an asset guarantee into one document. These days Couchspace among others support transactions. Ironically Couchspace a no SQL database also supports SQL. Meanwhile some RDVMSs are now support storing JSON documents in the columns in the database. It isn't exactly the same as a document database but it is a coming together of sorts. But the more things change the more they stay the same. I took a college class on data modeling in the late 90s. We used Erwin and then created models and even reverse engineered existing database models. With my first job in late 90s I used Erwin on SQL server. Today you can use Erwin on no SQL databases like Couchspace, even reverse engineer in existing models so while things have changed and data modeling techniques have to take that into account it's nice to know how much hasn't changed. So with that we'll turn it back to you Shem. Andrew thank you so much. And if you have questions for Andrew we will be joining us in the Q&A at the end of the presentation today. So feel free to submit your questions in the bottom right hand corner of your screen for that. And now let me introduce the speaker of the series Donna Burbank. She is a recognized industry expert for information management with over 20 years of experience helping organizations enrich their business opportunities with data and information. She currently is the managing director of Global Data Strategies Limited where she assists organizations around the globe in driving value from their data. She has worked with dozens of Fortune 500 companies worldwide in the Americas, Europe, Asia and Africa and speaks regularly at industry conferences. She was just in fact with us last week at the Data Architecture Summit in Chicago. And with that let me get the floor to Donna to get today's webinar started. Hello and thank you. Thanks Shannon. Always a pleasure to be on these and for those of you who were in Chicago last week it's always nice. I often at my sessions ask how many people see these online and get a good show of hands. So it's always and a lot of you do come up in person and say hi which I always appreciate. So much appreciated and thanks for joining today where we will be going deeper into what Andrew sort of started to talk about which was data modeling. Both from the business perspective very happy that Andrew brought up the idea of a conceptual data model because I do think that's kind of the lingua franca across a lot of different technology types. But we'll also talk about some of the technology approaches because as he mentioned that there's a lot of change in the industry and how and where data models still apply. For those of you who this may be your first webinar just wanted to let you know that this is a yearly series and all of the previous series are on demand I think forever out on the data diversity site so you can catch all of them including this one if either you want to pass it to a colleague or join it again. And do join us next month or in December the next one the last one for the year. I know at the end of the year a lot of people are starting to plan next year and what does their new future state data architecture look like with all the changes in the industry. It's a good time to kind of step back over the holiday period and think about that so we'll be chatting about that in the December timeframe. But without further ado today's topic is on data modeling and I want to reiterate what Andrew mentioned that I'm not sure why that even came up that data modeling would go away with some of these new technologies but that was sort of the buzz for a while and that buzz is going away. We've done a few data diversity surveys and over and over again I think one of them the survey respondents 97% of organizations in that particular survey were using a data model certainly not going away. They're changing. They're evolving. I completely agree with that as part of the fun of being in data management today. There's so many different choices and I thought it was also interesting when Andrew mentioned that there's a lot of merging between the technologies. There's a lot of good approaches and what used to be in a document is now in a database and vice versa. So kind of adds to the complexity but also the fun. But I think one of the appeal of data models is that it can take that complexity and turn it into an intuitive way that not only can the technical team benefit but also business stakeholders more and more and you've probably heard in my other webinars I'm a big fan of mentioning this and discussing this and building strategies around the fact that business and IT are merging as well. What a quote business stakeholder and what a technical stakeholder is is starting to merge and more and more business people are really interested in tech, might not want to code, might not want to look at DDL but do want to understand the data structures and that is a great way for a data model because it can really bridge those two worlds. As data management professionals and as data modelers we love our definitions so I thought which kind of start out of what is a data model and if you are new. The first definition is from my colleague Steve Hoperman and Chris Bradley in our data modeling for the business book where we tried to step it back and it really is that set of simple symbols to communicate concepts and the business rules around it. And I think that often we use that idea of an architectural diagram for a house and that we often do, if you are going to have a house built you might have an architectural diagram. Why do we use that? I've never built a house before. I don't know how to wire all the electricity in my house but I'm glad somebody does and I'm really glad that they can be communicated in a really clear way. All I have to do is look at those simple pictures and basically you've got a sense that there is a house behind this and there's electrical wiring that goes to different rooms and that sort of thing. So not only does it convey these images in a technical way but it shows the relationships between concepts and objects and we'll talk a lot about that. I think that is some of the value. It's not just the boxes on the model but the lines between them. Taking another definition from the Damage Dictionary of Data Management. It's a model that includes those formal names that I mentioned. Definitions are so important. The proper data structures and these precise data integrity rules. So often these look very simple and I'll show you examples later in the presentation if you're not familiar with data modeling. But simplicity can belie underlying value. Something as simple as, to use example Andrew gave, can a customer have more than one email address? Can they have a text on a cell phone? We've probably all had some of the negative customer experiences where those business rules weren't in the system. Can I have a PO box and a mailing address? Or am I getting my mail at the wrong place? Some of these very simple business rules can really have downstream business effects. So that's where these models come in handy. The other nice thing about a model and I've been in modeling for a long time now is that they are an active use across a wide variety of industries and that's one of the reasons I'm still in this industry because it's fun and I think if you're as long in the industry as some of us on this call and myself are, we've probably all done our stint in financial services or insurance or government and all those, nothing wrong with those industries. I think what's sort of fun is there's such a diverse set of industries we can work on now that everyone's sort of coming wise to the fact that data models can help. So I tried to take some real world use cases from our practice. As you know data, global data strategy does this for living and we have a lot of fun with it and partly because there's just so many people interested in data nowadays. The first one we'll actually talk about again environmental data sampling. The UK environment agency actually was on with us earlier this year in December so you can always see that webinar again of how they use data models for environmental sampling. I think my favorite one is that second box, early childhood development. So we're actually doing a project with a Head Start program outside of Detroit and they these are early childhood educators going on a whiteboard and building a data model and for those of you who say that, quote, business people and I don't like that term either, business person. In that case there's a teacher, right? A business person could be a scientist. We kind of use that very broadly. I think we gave them a five slide. What is a data model and what is cardinality and what do these things mean? One to many and that kind of definition and with that they were building their own data models and kind of arguing is a virtual classroom the same as a regular classroom and a classroom have more than one teacher, et cetera, et cetera. So I think that's the beauty of these models across industries. Everybody can understand them very quickly and really get to the crux of some core data issues. Now e-commerce and digital transformation we've got several customers using data models for that now and again that sort of goes in the face of data models being old school. These are companies that are looking to completely transform their business from either brick and mortar to digital from very traditional transactional to more e-commerce and the first thing they do is build out a data model. What is the same and what is different? Is mobile number more important than perhaps the land line or et cetera, et cetera? Do we even use email anymore in this new world? And that is all discussed through a data model. Back to education we did a large data model for a university out in the Midwest and they were trying to really understand their student journey and what is a student? Is a student an online student who might be coming back from, you know, might be a military person that's doing this part-time or a mother with a child in their lap or someone on campus. All these kind of core data definitions were done through the data model. To the bottom we had a water utility, kind of doing some data modernization they were trying to do some digital modernization and they used a data model. We have a construction company and they're really trying to understand how they can better bid their work and kind of do benchmarking across when they bid the work and when they develop the work. First thing they wanted to do is build a data model. Agile software development, I want to bring that up because that's another kind of elephant in the room of we don't do data modeling, we're agile. Good news is I hear that less and less because I think people are realizing that you can't skip that step or you're going to come back and do it again. And so, and I'll talk about this later in the presentation, doesn't mean you have to build an enterprise data model that takes a year and a half. You can do I think some sort of enterprise view is important to start with, but then you can break down your data model into kind of chunks and do it in an agile manner. Customer centricity, that last example, we have a lot of organizations really looking to understand customer. Those of you in the call who have done data modeling probably more than ten minutes, you've probably done some sort of what is a customer type data model. This is a membership organization in Europe we're working with and they're really trying to understand what is a member of their organization. There's an online member that is a subscribing member, etc., etc., and then all of the touch points with their different products and they used a data model for that. So hopefully those might have jogged some thoughts in your head that it isn't the traditional data model that you might have done and not necessarily even the traditional industry. I would say almost any industry that you are on this call could use a data model. In fact, one of our clients again, she was a quote business user, she was a scientist, had never done quote data models before and she came back the next day and said, I modeled my kitchen. I'm a baker and I really wanted to organize my cabinets and the first thing I did was a data model and they are addicting and so once you sort of get that mindset in your head you often will be building these data models just for sort of life events because they're just a very simple way to really simplify complex topics. So I mentioned this before with the idea of agile or skipping data modeling and again I see that less and less but my mother always said, if you don't have time to do it right, you have time to do it again and that is really where data modeling comes from. So this is, yes I will put cartoons in this presentation because there are data modeling cartoons and since I have them I must use them because it's just a very unique thing. But this might not make anyone else who's not a data modeler laugh but or maybe not even you but I'll try this idea that these people are all the way through acceptance testing and everything looks great and we're going to build this new marketing application. Just a question, what's the customer? Are we going to define that? And that might not seem funny to you until you've been on a project where this has happened and I would say ugh, I've been in the industry forever doing this and even just this year we have had three separate customers do those very embarrassing type of what is a customer mistakes like sending renewal notices to people who aren't customers sending, please buy this product, two customers sending out too many emails to the same customer because they have more than one address we had one, actually two, two of our customers did but they actually sent mailings to somebody physical mailings and one poor individual, one was in Europe and one was in the US, got thousands of mailings and I can picture in my head in the data model what happened and you might as well and you might have thought somebody in the human side of that might have noticed that one individual was getting a thousand magazines or a thousand letters but that's the type of thing that happens and they sound so ridiculous and somebody who's not in data could say how on earth could a company do this and those of you who've done data modeling might actually know some of these mistakes you can be by just having a cardinality wrong or just missing a relationship between core entities that have to do with customer or getting the definition of customer is it a lapsed customer, is it a membership customer, is it a customer with maintenance is it a prospective customer where they haven't bought the product but we're going to call them a customer because we'd like them to be etc etc etc and that is why data models are important I think it's also important to note that all data models aren't created equal and that there are different levels for different audiences so especially if you're a large organization you may even just want to start at the high level or enterprise or subject area level and what are our core buckets of information we have product, we have invoicing, we have sales or however you want to do those different subject areas just to start that grouping generally I always start with some sort of conceptual layer and that really is those core business rules that gets down to what is a customer what is a member, what is an employee what are the concepts and rules around them and for those of you who might say isn't that a little academic do we need to do it hopefully by the end of the presentation we'll show you the value of that but I also would say in the spirit of Agile or being more Agile you can do these things very quickly I have we've mapped out workshops with a data model with either a white board or sticky notes or just conversations in an afternoon or I'll give you an example later of when we did in a week pretty much complete and usable and we had actually uncovered a lot of core business issues just by kind of mapping that out with some key business and technical stakeholders. Logical takes at one level down where yes it's still at the business level and we'll talk more about that but you are getting into more of the detailed business rules, data structures not database structures not you know sorry platform structures but think of hierarchies or relationships between things and I would still say that is a business level maybe more your data architects but definitely more the business focus and then when you're getting down to the physical that's really is where we're thinking of a physical table. Is it a relational database? Is it a key value store? Is it a XML schema, cobalt copy book whatever and that is important at that stage I don't want to belittle that but I also want to also mention the focus up at the conceptual level where you really don't want to skip that because that's getting your business rules. The question we often get is where do you start? It can be complex and you don't want to ignore the existing technical environment but you don't want to ignore the business either. So do we start with a top down? Do we start with a bottom up or do we kind of do a mix of the middle I think in reality it is that mix the middle and do an iterative approach in a perfect world and probably in general I always start with the top down because that's the why. What are we doing? What are we prioritizing? Because that helps focus no matter what your company, you probably have too much data that you can actually manage and so you have to prioritize. Physical is great because that shows you the real world and often and Andrew mentioned this as well a lot of these data modeling tools he mentioned earlier but this other is the RStudio Power Designer. Most good data modeling tools in the market can quote reverse engineer and do a great inventory of your physical environment literally just a click of the button and if you haven't discovered those tools please use them as a great way to look like a hero by creating some of these models and really understanding them. But I think often you can look like that hero with just one of these aha moments or a few of these aha moments. I caught the issue because in the model it's an older school model from the 90s and we're restricting somebody to one phone number. Well we can't do that anymore but we text our customers and we're texting the landlines because there's no way to kind of indicate that this is a mobile phone. Seems like such a simple example but I bet marketing would care about that. I bet marketing could show some ROI and sometimes just looking at that business level model what are we trying to do with data and looking at the physical model well how has it been implemented and vice versa can be really eye opening or you might have done the business model and then you start looking at the technical environment saying well I never know that we still tracked fax number huh should we still do that and again it's that iterative approach and you need to do both. I wouldn't really skip either if we're really going to full implementation if we're starting implementation I definitely start that business level because that helps you scope the problem with just starting at the physical and you might have done this and again you can look like a hero very quickly by creating a physical data model for a database with a thousand tables but then what do you do with that and trying to zoom in and really see what that means that's where the conceptual and logical can help you kind of prioritize based on the business value. So if you haven't seen a conceptual data model this is one example where this particular tool we're using I like this one because you can actually show the business definitions right on the box so this almost looks like something you could do in PowerPoint. Well what's wrong with that business users live in PowerPoint so the more you can make these business data models accessible the better and you might have seen some of my presentations before I don't have any of those cute examples in this one where I actually do put pictures of an employee or a picture of a sales rep or a picture of a product because that really can get to the crux of what these models mean and so I guess my key thing to remember for you to remember when you're building these is what is the focus and who's the audience is the business and you want to keep it simple and you want to keep it focused on the business and business terminology so and I've probably made all these mistakes so I'll just admit them myself and it's easy to do so don't do them. One is to get too much down into the logical and physical. Oh right there I could say employee was an employee that full-time employee that part-time employee were the different employee types should we put some attributes it does make sense you might want the subtype you might want to put some information in there but sometimes it's okay to go to that level of detail and then abstract back out again you might need to go to that logical to really understand what that entity is but remember then when you're presenting back to the business simplify simplify simplify nothing scares a business person more than already human being with this model of a thousand entities the more you can keep it simple it can be frustrating because people say well you just spent six months and you have five boxes but generally I don't hear that generally that clarity is enlightening and people say great I've never had someone explain that to me so simply thank you I think the other piece I would recommend is not only the simplicity but use the business language that is where you got to have the aha moment so just use this example we have a sales rep and a support rep are those the same things are they different we have a customer could there be a client is a client same thing as a customer what often people the academic conceptual level will just say well they're all a party let's just make a party model because you know a party has a name and an address and yes that might work at a date you know to kind of simplify your data structures but I think you're losing a lot of actual business value and if you oversimplify you could just have two boxes thing relates to thing and you get nothing so I would say when in doubt use the terms of the business and just question that and try not to use an academic term we did this in one client one of the other consultants actually said let's have the party model and the client was great she goes a modeling party we're going to have a party and so we never use the word party on the model we actually use client and member but we did have a data model party and our workshop had balloons and cake and I thought that was great she kind of turned that on her head but usually that turns people away because those types of generic terms can often just seem too abstract so you know but that is where you get things like what's the difference between a customer and a client do we have different relationships depending on that interaction the other thing and I mentioned the physical model being a great way to sort of inventory creating an inventory of your data assets but you could also do that at the conceptual level and so just starting and these are all of the tools you know the entities I have in the organization I have staff I have location I have customers I have weather events linked to locations this could be anyone from the business thing I didn't know we tracked that or a developer going that's cool I could also use that trend analysis a lot of these different areas come out even at the conceptual level of how people are using data and what information they have so it can be a very helpful sort of a roadmap and a lot of the organizations I work with actually have some of these models on their wall printed with sticky notes on them and it really is your sort of data catalog of at the high level what are we tracking in our organization and what is our organization I mentioned before but when you use these data models please use the language of the audience and yes they kind of start to look like PowerPoints but they should they should be simple a customer buys a product it's not over complicate that you do detail and use business terms as much as you can because really the goal of a data model is telling a story and I joke that my colleague Steve Oberman who was the co-author of the book that this came from I think he actually did read data models to his kids in the evening I think he said his daughter was the only girl in kindergarten that could normalize a data model I think that's a good thing I'm not sure what we do we think in stories we can get the facts we can get the data but unless you contextualize that into some sort of so what or some sort of anecdote it's not going to resonate I heard a podcast someone was talking about you know we're so tied into storytelling our ancestors told stories around the fire we can't even sleep without creating stories in our head they're just always running we just that sort of how we're wired for whatever reason and so yeah I don't think you're going to get up there in front of your business stakeholders and say they lived happily ever after but you do want to relate this to a business scenario remember that campaign where we sent too many emails to each customer because we had this problem or remember we tried to get a list of our customers and we couldn't because they were duplicates to remember when we this is why and so or those it's easy to find kind of the problems but you can also find opportunities wow if we relate weather data to product sales we could look at this relationship what could we do with that we have the relationship between sales reps customers can we look at how which sales rep has the best relationship with customers or whatever some of these relationships you hadn't really thought of it's a great way to kind of brainstorm and think aha about the business so use them for that way create subsets of the models to tell different stories and different scenarios just kind of tell that through and you will get a positive experience from the business if you follow most of those kind of guidelines and these are some quotes from our actual real world living breathing customers that early childhood model what one of the teachers got up and said this is really elegant you just summed up our organization in a single page and I think that is the beauty of that we get that a lot how do you know so much about our company actually that's the one over there in the lower right we were at a utility company and I brag about this one in particular not because we're particularly smart well maybe we're I don't know but it's those data modeling constructs that make you look smart so we were there for literally a week we did a bunch of workshops and we played it back at the end of the week and this is a fairly complex organization at Water Utility and they were going digital and they were looking to do artificial intelligence and a lot of complexity and we showed the model and someone said so how many years have been here at this company that was pretty nice and I said a week are you serious how do you know that and I said I you know anyone can do that with a data model that's the beauty of it you ask some leading questions you get some of these core nouns and verbs of the organization it really clarifies some core concepts chief marketing officer you know her concept was now I get why these campaigns weren't working we're not linking email with the other pieces we're not having some of the relationships and she had complained that nobody had ever explained that to her before everything was very complex she had a lot of you know techie people showing complex diagrams but no one just ever the crux of the so what and then she sort of like that one of my favorite another favorite was from a technical stakeholder and we could maybe say do we need to show data models to the techie team or this is for the business folks it was a VP of software development at the university I dealt with and they were building a lot of apps and a lot of new applications for students and after looking at this data model where we had sort of broken out the data model is a different user types different types of students you know this we actually also linked it with a process model or a kind of a customer journey map a student journey map and he said you know I've never seen data from the students journey before this was great and he said it was really eye-opening that he was just thinking of the data points and not the person and so kind of doing that combination of a customer journey map and a data model was sort of eye-opening and the last one you will probably get yourself once you start building this we love it and we have a couple more departments that want one too can you do one for them and even better can we use that as a conversation mechanism and again is across all industries I won't go into too much detail about the environment agency because they have a whole webinar with us on the university I think it was earlier this year in February and March and they talked about how these were research scientists that were all doing kind of water samples and air samples and they were able to use the data model as that sort of common lingua franco between them of is a measurement to measurement is a sample a sample and actually showed some efficiencies across the different organizations and that's another example where your quote business user were research scientists and they loved it and they really helped explain things so data models are great at the high level with a conceptual and what this model did and I know you can barely see it but we did start the conceptual level and then use that I mentioned earlier as kind of a roadmap of all the things that are most important what is critical to start with now so for them it was measurements or they were creating also code lists and kind of data standards themselves in those green boxes were the roadmap for where we want to go into more detail we're not going to do a detailed data model logical physical on everything but these are the heat points of the things that are most critical for business value so from that we went in that particular example and a lot of others down into the logical and that's really where you're going to get into your detailed business rules your data types your attributes and your business rules in terms of cardinality and nullability and things like that and it does define data structures but not physical tables and I want to be clear on that and you know what is your sales hierarchy can a part here be a raw material if it is good or assembly there's certain logic in data that has inherent structure and you want to get that you're starting to get that you're not necessarily creating tables from this some of the tools especially some of the ones that grew up in the relational world as most did can easily flip and then turn this into a table and create your GDL which is a great benefit but I think because of that ease of switching in between and they look so similar I think some people jump to the physical model a little too soon but stopping a little bit and thinking of just the business rules can have a lot of benefit so things like kind of customer have more than one address and I have vented before so I'll keep my vent short of I do live in a role area where I have a mailing address where I can't get mail in a PO box where I can and the number of times I'm trying to order a product online and they can't seem to figure out that one is a shipping address and one is a billing address confounds me it's getting a little better I think more and more folks are data patterns out there and I recommend using them not necessarily an industry standard data model but at least look at them for an idea you're not the first person who's ever modeled an address or a phone number you look at them for ideas because things do change even as simple as phone numbers that second bullet there is a fax number still a required field I've been in probably a number seven data modeling workshops this year where that has come up do we still track that and then to tease the millennial there below in the room what's the fax number I kind of feel the same way I often wonder why so many forms still ask for a fax number and I hated them even when they were popular and they're not anymore so I've tried to do some banking forms where they don't let you go any further on the form without a fax number trying to explain them I don't have a fax sort of hard and in this day and age it's crazy so that's an example of where data modeling can help think of your customer reputation you know I'm trying to be a new hipster digital bank and I'm still asking you for your fax number or your landline and maybe those are valid but maybe those have kind of gone the way of the Dota when we want to get those off so again those are the kind of conversations you can have and so many of my customers are doing things like digital transformation and using that data model just for that type of contact information that came up in the beginning of the call with Andrew and I agree with that how do we contact a customer what does that mean in this digital age is it your social media account is it your text do you want a text or do you want an email all of that type of thing can be hashed out in a data model before you start building stuff that is hard to unravel so you'll see here you can create simple data models you know a customer can place an order and a product can appear on an order again if you're not technical you can still look at this and pretty much understand that we have customers ordering products get it a customer has a first name and a little last name does that make sense should we store it that way etc just something I could go all day on any of these just how we store a name it can be complex do we have one last name do we have two last names do we want full middle name if someone has two last names do you put them both in the one last name do we call it family name what does a family what does that mean that's where we start to sound crazy unless you put that in context I think one of my customers this week we were at a product development company we said yeah my family is going to ask me what we did all day and I spent all day asking what a product was it can sound kind of academic and strange but until you start to see these subtleties and that makes so much sense so I would say pick on a particular example and really highlight some of those use cases but one thing to avoid and I touched it earlier is please don't do this sort of death by data modeling and anyone business for a while has either gone through these or had to sit through these and I've met a lot of business stakeholders who are probably PTSD or traumatized from some of these experiences where I remember those data models where you want me to sit in a room where I've seen those rooms where the data model covers the wall for all three walls and there's thousands of entities and I just I know you're busy with your job but could we just sit down here for a few hours and talk about the cardinality of these two data objects and what the right data types are for this attribute funny they don't want to come back so don't do that to anyone even developers anybody in the case that so can we pick each a section of this model and kind of divide up into smaller chunks or sprints if you want to call it that or stories or scenarios but because these only make sense in context the other part of having something that big you lose the context no human brain can focus on all of that at once so you want to keep it reasonable so we mentioned that or I mentioned that these logical data models are a great kind of first cut to start thinking about some physical but it isn't the physical one of the nice things from a data model is you can't go down to the physical level and either create reverse engineer and create an inventory of all the data systems you have you can start to add definitions to those and further definitions if they're in the comments if you can and start to even just build some database consistency and I know this isn't probably the sexiest part of the job but it can often be the most valuable are we all storing part number of the same way or do we have consistency and most of these good data modeling tools can create data standards and enforce them and before you poo poo that again I want to throw actual names of customers under the bus but we work with a fairly major retailer earlier this year where this actually this actually happened in the 21st century one of the developers changed the length of the product code and brought down their retail system for a day and think of how much revenue was lost because there was no data life cycle standards for database change management and as you can imagine that's core master data and even something as simple as changing the length of a product number can have disastrous effects and that's where these data models having that data model review you can look and do that impact analysis through the model and not have to kind of test that out into the live system so as we get through to the physical data model that is where you're starting to look at the different database structures whether it's the RDMS document data store et cetera and you do want to start to optimize for query performance do we want to put it in third normal form there's great reasons to do that if I want to have things accurate and non-duplicated I would hope a lot of your operational systems have some of those checks and balances do we want to flatten it out for performance do we want to do a key value pair do we want to do a dimensional model for reporting really think that through so before you think through how think of the what do we want to reduce redundancy and increase quality is that the main goal for this particular data store do we want to optimize for stores slice and dice do we want to optimize for speed of query do we have to choose between those can we do all of them at once I think that's the type of thing you're thinking about when you're down at that physical layer and then I do want to get to this last point and if I'm allowed just one rant I had some mini-rants but this is going to be my full rant if you leave with nothing else leave with this and I think Andrew touched on this as well there's so many choices now in the industry and there's different physical models for different use cases so there is still a place and we've done some data diversity surveys that relational databases are by no means going away there's still the workhorse like 70% of most organizations data are higher is either on the cloud or on premise relational databases and maybe I just want an excuse to use that data modeling cartoon about getting into third normal form because I think it's funny but there's reasons for that we get a whole webinar on normalization but really to sum it up it's you're reducing redundancy you're increasing data quality because you're doing some of those consistency checks and they're super important against so many of the simple data quality issues could be fixed with some basic normalization and you know modeling out to really help relational databases do what they do I mean we generally do start with some sort of reverse engineer in our practice when we do come into a client and look at the system the number of models we see without any of those lines where people are really using databases like a spreadsheet is sort of disheartening at this day and age when the power of the whole point of these relational databases are sort of not being used without any of their referential integrity they're great they're still storing data but just make sure you're not modeling it like a glorified Excel spreadsheet because there's a lot of power you can do by really adding those relationships the good old dimensional star schema and again this could be a whole webinar in itself there is still value in those and we use them all the time part of the value is the simplicity if you've never heard of these when you think of I'm again using a spreadsheet example I'm doing a pivot table and I want to see sales by region and sales by sales rep and sales by country whenever you're doing a thing by another thing those are kind of your facts and dimensions so there's a lot of debate in the industry or is this style of data warehouse dead with some of these new platforms that are so much more performance and scalable maybe you don't need to do this for speed anymore but I think for understanding they're a great way to really put your data in a consumable way especially for self service reporting but I do want to kind of slice and dice this by this by this no sequel this is an example Andrew showed earlier great use case for that kind of speed of retrieval low latency having that flexibility for change managing the high data volumes and etc etc and there's so many ways to store you could do XML you could do time series the good old people we laugh but early in my career I had to learn to read those never had to write them just kind of reverse engineer them but knowing that there was still live and breathe there's still a lot of working mainframes and organizations so that's where some of these reverse engineering tools can really help you as three buckets data vault that could again could be a whole webinar of how you can use that for kind of some active storage and have some different ways to model your schemas etc etc but I think if I were to say one thing that need none of these is inherently better than the other and it's your use case that drives what good looks like and I rant about that partly because so many folks learn one of these new ones and then everything you know we have a hammer and everything looks like a nail and people get to be sort of bigoted though we don't need relational databases now we do everything you know sequel we need dimensional now we have and you need all of them for their particular use case and as Andrew mentioned earlier what makes it even more difficult is there's so much overlap now between a lot of the vendors the database vendors are kind of taking the best of all worlds and kind of mixing so that's where I like to go back to first principles what is my use case what are the data modeling concepts and constructs I'm trying to optimize and then what tool can use that not to tease a client but I did when I was working with up in Canada and it was one of these new hot platforms and she was testing it out and I said oh that's great what's your use case for that and all the guys in the college laugh like oh you got her she just wants to play with the software that's fine but I was really asking why she was using it for to optimize the business case and that hadn't been thought through people just wanted to play with a new technology which I get we all do production in that until you're really mapping it to the right use case so hopefully thinking of that in terms of use case core principles and what are the pros and cons of each platform will go a long way so in summary the beauty of these data models are the fact that they have a sort of visual way to look at both visual and technical models make sure to don't mix and match inappropriately that use your business model and technical for technical and the technical is a great way to kind of design these new platforms and they're not going away they're evolving. The data models are a great way to kind of have that conversation either before you're going to production to avoid errors or sort of after you've gone to production to kind of manage change and change management before we open it for questions because I can see it has been a very active chat as data modeling topics always are because data models are passionate just a quick call out to the December if we can go into more detail in some of these new technologies on how you kind of think about that to build a realistic future state architecture plan. This is us we do this for a living and I'll pass it to Shannon to open it up for questions Donna thank you so much and just to answer the most commonly asked questions I just a reminder I will send a follow up email by end of day Monday with links to the slides and links to the recording from today and as Donna mentioned there will be a link to all the past recordings as well so opening it up to both you Donna and Andrew you know would you view a data model as always being an ERD or does it make sense to model information as an ontology as an ontology I'll grab that and then pass it over to Andrew. Yeah good point we kind of showed some ERD like models I think that's only one tool in the quiver I think of the conceptual model often that is kind of good to have that just very high level core concept of what a customer is what a client is there are other this UML models that kind of have their you know slant and they can kind of add with some other different diagrams as well so yeah I know there are ontologies are great they have their use case but we kind of did kind of lean towards the ERD in this particular I think they're probably the most advanced in terms of the tools being in the market but yeah it's just one tool in the quiver and there's a lot of great ways to visualize anything you want to add Andrew or thoughts on that? Yeah I think it does depend I'll give the consultant's answer right it does depend on what you're doing and who you're communicating with I think a lot of times the ERDs are a little easier for business users to understand once you get to a certain level of complexity with the ontologies because I've seen these ontologies where they really you know went wild and then okay but when I'm trying to understand the complexity or whatever or when I'm trying to understand things simply sometimes the business users I think like the ERDs better if that makes any sense I would agree with that and even within ERDs I think often us in the techie world can get you know we have our favorites or want to get complex and like within ERD I'm a big fan of those crow's feet because that's just so easy for people to say okay there's one and there's many and it looks like four fingers there fingers of a crow and yeah I think if once you keep it simple it's kind of an easy way we had done a survey with one of my data modeling for the business book and we took 6 or 7 of the different data modeling terminal and I just asked some random people like a carpenter and a teacher what does this say to you and the one that everybody could kind of look at and understand was ERD so I would agree with you they do seem to be kind of intuitive but I want to say one more thing sorry I would also say let's not be biased about what is a data model I've found I've talked to business users and I'll have a spreadsheet of terms that looks like a hierarchy that's really a data model I've had I've given an example I had a chef that had kind of a marketing team had gotten together and drawn out what they've called their data flow and looked a whole lot like a data model they kind of had entities with lines between them and so I think sometimes just taking off our data modeler hat and looking a little more broadly it's a thing that has structure that communicates to different users and as long as it has those back to that core definition it's a data model right it could be an XML schema that's a data model not intuitive to me to read one but this tool is to visualize that as well alright next question before I keep rambling about that one so order may have one that more than one product is it many to many if an order may have more than one order yet just order order may have more than one product so is it many to many I generally see often when you have a that's how you model that out but order is often one of those intersections or kind of an invoice so often when you have a many to many and you can have an order for more than one product so what's the thing between that maybe there's an invoice for that or a customer can have more than one so that's kind of how I look at that if a product can be on more than one order than I would but yeah that was a very specific question kind of threw me off there but yeah it could be but when you do many to many you just kind of read it backwards right can an order have more than one product if yes then that's yes then how would you kind of rationalize that in between then what would be the thing to show that this is that particular product on a particular order so what might be a good way to what might be a good way to split the potential complex enterprise data model into chunks I'm a big fan a lot of tools are really good at this they have tools for it just kind of create subject areas so one way might be and this is part of that story I was telling about sometimes there's subject areas I'm going to have the finance model or the sales model and you kind of can however you define your subject area some people can do it by this is all the stuff that has to do with customer this is all the stuff that has to do with product those are kind of your traditional ways and often you know most of those tools I mentioned earlier we mentioned earlier in the call have a concept of subject areas we break it up at that level often they also have a way to kind of do diagrams or visualizations and they can be handled handy I often kind of spell them out with colors or if you're trying to tell a story that's another way this is the model that shows why our email campaign is broken and then you might have a color of this is where the email goes and that kind of thing so be creative but you know the kind of the academic way would be to break it into kind of business subject areas and then kind of the more flexible as you're telling the story often if you think of the diagram of the visualization and the model or the sub model or those subject areas is kind of the metadata kind of between those two you can kind of start to tell that story a little better I don't know Andrews do have thoughts on that sorry my connection cut out during the question they were talking about what a good way if you have an enterprise data model to kind of break it up into chunks didn't know if you had opinions on that I kind of in the past just sort of looked at the overall use cases and then the things that were most most related to use cases tended to go on one side or the other if that makes any sense I think we have time for a couple more questions here so using industry standard asset data models such as any suggestions on approach or tips for modeling big data for modeling big data that's such a broad we actually have a webinar on that in the archives so at the conceptual level I think big data it can I almost see the conceptual is going across all of those platforms so we have customer data we have a product and then big data is an instance of that so I have some of my I don't know some of my customer data is coming in through IOT medical devices or something like that so patient data that would be big data is often what do you call it schema on read so generally there's some sort of schema you can kind of create a hive table you know that type of thing you can kind of create some sort of structure and then read it that way and then a lot of the kind of big data so broad we could have a whole conversation some of the more kind of cloud based even I don't know even your S3 buckets and things like that have some ideas of kind of metadata tagging and some light structure and they have some actually some decent metadata tools right in the tool that I see that more as the physical layer I would probably use one of those old kind of ER tools for the conceptual layer and then depending on how you're storing that big data if it's in kind of a hive structure it's easy to kind of quote reverse engineer but some of the more platform specific actually have some starting to get starting to get some good metadata around that as well. And Andrew before you jump in here I just kind of want to expand on that because as you mentioned conceptual you know the conceptual is somewhat easy and the physical is the physical implementation but how can we create a logical model for her Duke data lake so our data lake in general you know that goes along the lines with that big data question and then that would probably be my same answer of a well as the conceptual but you still have the core business rule and data layer how you just like cringe at that term that everyone has a different definition and I guess we're not none of us is talking about a data swamp where there's no documentation but at a later lake you still need to understand the core business rules around a customer and what that you know can they have more than one email how is that stored so if you're thinking logically you're still thinking data structures and yeah I guess I just an example we had one customer and they were doing some kind of big data testing they just kind of throwing it out on the cloud with some credit card data for analysis on one of the one of the analysts said oh I didn't realize I wasn't supposed to do that because it was PCI would actually put the real credit cards and that was something that kind of a more logical model would have said this goes here and that goes there but you know it is true to me that's kind of true big data is sort of it would be schema on read and you're doing it as you build the data so it's kind of a different way to look at things it's not I build the data and then I put it in big I build the logical and then I put it on big data is like you have big data and then you have to kind of put it into a a little model structure before you use it you know I would say that despite the schema on read type a little bit with with Hadoop in particular but in general there is some physical structure there whether it be a file system and files with directories and files and what have you or if you go to Hive where it pretty much does look like a relational database although in many ways not a very good one database standpoint you know and those are the essentially physical models so there is something there to map and the idea that you know a lot of these Hadoop data like what have you systems they've been used for a few different purposes and definitions are a little bit different in another life by I implemented a few of them but but a lot of them are being used as sort of almost a foreman's informatica in some cases and what have you so you can kind of go back to some of those techniques and in the other case they're being used as sort of a foreman's teradata so you can look at some of that but map it more to a file system structure or to Hive. I love it and that does bring us to the top of the hour I'm afraid that is all the time we have for today thank you Donna so much as always for a great presentation and thank you Andrew for joining us today it's been absolutely great and thanks to Caxley for sponsoring to help make all this happen and of course thanks to all of our attendees for being so engaged in everything we do we just love all the chat and the questions that have come in today and again just a reminder I will send a follow-up email by end of day Monday for this presentation with links to the slides and links to the recording of this session hope everybody has a great day again Andrew and Donna thank you so much thank you it's always a pleasure