 And welcome, my name is Shannon Kemp, and I'm the Chief Digital Manager of Data Diversity. Thank you for joining the latest monthly webinar series, Data Architecture Strategies with Donna Burbank. Today, Donna will discuss data architecture best practices for today's rapidly changing data landscape, sponsored today by Clover ETL. Just a couple of points to get us started. Due to the large number of people that attend these sessions, you will be muted during the webinar. We very much encourage you to chat with us and with each other throughout the webinar. To do so, just click the chat icon in the upper right-hand corner of your screen to activate that feature. For questions, we will be collecting them via the Q&A section. Or if you'd like to tweet, we encourage you to share highlights or questions via Twitter using hashtag DA Strategies. As always, we will send a follow-up email within two business days containing links to the recording of this session and additional information requested throughout the webinar. Now, let me turn it over to our friend Pavel from Clover ETL for a word from our sponsor Pavel. Hello and welcome. Hey Shenan, thank you. Alright, so I should be sharing my screen now. You are good. Alright, perfect. Thanks Shenan and thanks Donna for having me here tonight. What I would like to do is I would like to talk about data integration. I'm from Clover ETL. We're the maker of Enterprise Great Data Integration software platform. And I would like to tie data integration to data architectures and I believe Donna has a very interesting presentation where she will speak about the data architectures. But from my experience, I see data integration as being like the foundation on which pillars that support practically any data architecture stand on. So let me show you a few slides where I would like to discuss data integration. So let's start with the world. We've got data. Let's say people carry information and that people are data. Now people organize themselves into political structures, states, organizations and businesses. And this is your sort of data architecture. Some form of structure and order that you apply to a group of people. But that wouldn't be possible without communications and means of transportation. And that is the data integration foundation that data architectures sit on. So what can data integration do? And I have two areas where we see data integration being useful. One is the obvious thing, operating data architectures. Data integration software gives you connectivity to any data source you need. You don't have to code it yourself, you just get that out of the box. Then you need to make the data fit from one source to make it fit to a target. So you need some transformation capabilities that these data integration tools give you. And then the last thing is publishing the data. Once you've transformed something, how do I make it available to applications or users? And it can be anything from storing it in a database or in a data warehouse to something more modern like publishing APIs that serve data or receive data and push them through some transformation in a backwards manner. So that's your typical operating data architecture as ETL and data warehousing and so on. But there's another use for data integration and that's actually working on the data architecture itself, developing it. There are tools in the data integration platform that help you understand the data you have, characterize the metadata and the characteristics of the data. So knowing your data is obviously an extremely important piece of it and data integration helps with that. Data integration plays its part in modeling data, working with data models and I'll speak about that in a bit. And then last but not least, it's testing data architectures. Testing is something that's very well understood and recognized in development, in the software development world, but it's neglected in the data world. And having some automation and process orchestration that allows you to not only build something, but test that it's actually doing what it should be doing and that it is doing it over time as you keep changing it. That's something that data integration helps as well. So I've got an example of a customer that we actually worked with and they saw an opportunity to use data integration in quite a curious way. And this ties to Dona's field of expertise, which is data modeling. And this customer, they have some operational system that they have a team of business users who want to be able to affect and change the data and the data processes themselves. So they started using enterprise data modeling tools, I think it was IDA, to allow these business users to model and make changes to their data. And then obviously you need to take these models and implement them into something that's executable. And they had a team or actually teams of people who were taking over these models and implementing it into their current platform. But what we did with them, we created this bridge between data models and the actual operational data transformations, which helps them get from like even nine to 12 months of like change requests and working with the development teams to reduce the time it takes to get from the model stage to an actual executable something in production to a couple of weeks. And this is quite curious use of data integration, which is sort of outside of the scope of like traditional ETL. So what else can data integration do? And this ties to the topic of this, of today's presentation, which is like there's so much change in the world that things are changing rapidly. You introduce new approaches, new ways of working. And what happens over time in an organization, you start seeing multiple data architectures emerging. It's sort of independent independently of each other. Here's a department that builds data warehouse for their needs. Yes, it should be enterprise-wide, but it's not. Here's another department working on some data silo that fits them, but there's no overarching architecture that would connect these two together. And either you can build, throw away these things and start over, or you can glue them together. And this is where we see data integration be used more and more because it gives you that agility over like redesigning everything. It gives you the agility of connecting data despite the fact that you have multiple architectures in the organization living on its own. But when you need to get like an overall view, then data integration can help create this glue. And this is something that hopefully ties to Donna's presentation. That's all I have. Clover ETL is a data integration software that we usually deploy in more like complex scenarios where these sort of things happen. Departments working on their own data silos or data warehouses. So yeah, if you have any questions, open to answering them or feel free to visit our website or contact Donna or Shannon if you want to get in touch. And that's it. Thank you, Shannon, for the couple minutes I have. Pavel, thank you so much. Always a pleasure having you on board. There was a quick question that came in. I know you have to jump off, so let me just ask it really quickly. What kind of data modeling do you use? ERD, ORM, or ontology? It's ERD for that particular case. But the modeling part is what they do. And when we came and started this project, it was sort of established. And then we work on bridging that into our own example data integration. All right, Pavel, thank you so much. And with that, I will give controls to Donna. Have a great day, Pavel. Thank you for joining us. Thank you. Thanks, guys. Donna, let me introduce our speaker for the series Donna Burbank. She is a recognized industry expert in information management with over 20 years of experience helping organizations enrich their business opportunities through data and information. She is currently the managing director of Global Data Strategy Limited where she assists organizations around the globe in driving value from their data. She has worked with dozens of Fortune 500 companies worldwide in the Americas, Europe, Asia, and Africa, and speaks regularly at industry conferences. And with that, let me get the Florida Donna to get today's webinar started. Hello and welcome. Thanks, Shannon. Always fun to do these. And thanks to everyone who joined. We got a good group today. And we often get some of the same folks every month. And I appreciate that. And sometimes I'm at a conference or a, you know, a day diversity event and folks come up and say hi. And that's always nice to put a face to a name. For those of you who are new or may not have caught some of the past ones, they are all on demand. And that's a nice service that Data Diversity offers all of the past ones, I think, back to forever. Great to meet you from wrong, Shannon. Are all on the website. So here's the ones we've had this year, January through April, if you wanted to learn about metadata or graph or any of the others. And you'll see the upcoming events as well. We hope you can join. This is an ongoing series that we have. But today, as you are aware, we're going to talk about data architecture, which really is core to the whole series. But there's lots of pieces to that. So there is, as Pavel mentioned, there's so much change in the industry. And how do you keep track of it? Which is fun and also kind of overwhelming. But at the same time, if you've heard me speak before, you'll know that, you know, a big fan of some of the core fundamentals. And it doesn't mean we're doing the same thing over and over again. It's the old stuff. There's just certain things, you know, there's a lot of new innovation in science, but the laws of physics still apply. So that's how I tend to look at it, that yes, we can have no sequel. We can have graph. We can have artificial intelligence, but you still need data quality and you still need metadata. That might evolve and that might change, but you still need it. And the real exciting part, if you've heard me speak, you know, I'm a fan of kind of this data-driven business, and that's the real reason we do this. So throughout the presentation, I'm going to try to, part of what's confusing, I think with all this technology, is the when you use it and the why. So I'll try to tie it back to some real-world use cases as to why you pick, I mean, we can all get academic on this, and I might like graph versus key value pair data stores, but really it should be chosen by a fit for purpose for your business case. So what is data architecture? We architects and data folks love to go back to destination, so just thought it would be worth staying here. So if you're familiar with DAMA, the Data Management Association, they're a great nonprofit organization that really just focuses on data management, and they have a body of knowledge or the data DM box. So what better place to go than to see what they have to say, and you'll see here, it's basically the specifications and the guides, and as Pavel mentioned, data integration is a core part of that, but it's also how you integrate and how you design and how you document and really create an overall data strategy, which you know I'm a big fan of. So I can't really disagree with much of what they said there, but I do think it goes beyond that. So I've mentioned this in some other webinars, but Dataversity and I co-authored a survey last year, and we'll be hopefully making this a yearly thing on trends in data architecture, and we'll give you the link, and Shannon generally sends it out as well when we mention it. So some great facts in there, but one thing that we thought was fun, we had some open-ended questions, and we asked people, how do you define a data architecture? Let's just go back to the best basic principles. And I agreed with a lot of the folks in some of their comments, just a little bit beyond what the DM box had said. It's really about leveraging data as a strategic asset, which I would agree. If you're going to think of our finances, what's more important to an organization than the money, or I guess you can't argue the people, but we have a whole finance department to manage our money, and a whole HR department to manage our people. So you need to manage data as closely as you manage your finances, and they have architectures around your books. There's financial principles that have been gone back for centuries. So I also had a little bit of an art and a little bit of a science of the art of designing the best future waves during the data. But there's also, as I mentioned, back to the physics example. There's some science behind that. There's some core data foundational principles that you can base it on that relational algebra as a thing, and it was around for a reason. But really, it's also that I like this last one, the discipline to help businesses manage their data to be the most effective, secure, compliant, and profitable. And that really is it. If we're going to say data is strategic asset, strategic asset, which you've all heard, and more companies are branding themselves as data companies, you have to manage it. Just as you manage your finance, no one questions that there's certain financial principles or that there's auditors for finance. Or you have, you know, I do a lot of finance running my own company and I can't think of anything more boring. You sum up columns of numbers and you make sure they match. But that really, when you come back to governance, then that's all that's about, right? Making sure things match and definitions match, and we have lineage and all of that sort of thing. So kind of like that little extra flavor that people added to the survey. And we'll be, again, as I mentioned, we'll be having one coming up next year at some point. And we hope you can give your input. We generally get several hundred people replying, which is nice. Another piece of this that was in the survey is the need for AR architecture, other than the fun of it, which nerds like me do think it's fun. Why are we doing an agent architecture? What are some of the business drivers? And no surprises here, but I did find it was interesting. Of course, regulatory compliance and governance is huge on the topic there. You are required to have lineage and a lot of the data governance regulations. But I also was pleasantly surprised by a lot of the business driven that we're trying to get either more efficient or save cost. I mean, who has not tried to do a data architecture or programming and you just know you're repeating yourself and you're doing the same things different ways six times or you know somebody else has built it. And that really is the efficiency of doing an architecture well. But some of the digital business transformation or the reporting in BI or the data science discovery, those are really the hot new things. But I've worked with a number of organizations this year and last that, you know, they want to do digital business transformation. We want to do all of the new hot stuff, artificial intelligence. But you can't do that without a good data foundation. So, you know, ironically, it's some of the new stuff that's getting people back to core principle, some of the core stuff and the foundational stuff. And they're not mutually exclusive. If you've heard me speak before, I think that's one of my phrases, the, the tyranny of and and the beauty of or, you know, women had it backwards, tyranny of or the beauty of and whether they use that up either, you know, the core or the new stuff. That doesn't make any sense. You need them both. You really need to get to the new things. You have to have some of those core fundamentals. So again, if you've seen my presentation, I like to show this and we've gotten some good feedback on it. But to me, this is sort of the menu of things when you look at a data architecture that you need to look at. So again, everything ties back to a business strategy. You know, data is fun, data architecture is fun. But when you're working for an organization, the reason you're doing that is for a business purpose. So something I'm pleasant to see, and as you can tell from my company's name, we built a whole company around it, is that more and more people are not just building data architectures, but data strategies. And I'm pleasantly surprised by that because that's really showing that data is strategic. It's not just how you put the bits and bytes together, but the why, things like governance and how you govern that architecture. And I always show this partly because, you know, anyone who's been in data for a long time, you'll understand this as well. I'll have a company say, I really need better governance, which may be true. But really, governance is weighted to quality. Or I might say you need an architecture to support your governance. If you don't have a data flow diagram or a data model, as Pavel mentioned, how can you govern? So what you can't manage, what you can't measure. So they're also intrinsically linked, where people may say, I want to do master data management, which is great, but that's not going to work without governance. The media, the people in the organization and the process around that. So they're so tied together, yet they're very distinct things that I sort of like to show them together. So it's a both top-down business strategy when we're talking about an architecture, as well as the bottom-up, how do I manage big data and databases and video files and documents in a manageable way that doesn't get crazy. So one of the things I like to do is actually just spell it out. And again, this could seem obvious, but especially, and we'll talk later about roles, there's so many new people in the business. And I have to remind myself, when you start to know something, you just sort of all of a sudden assume everybody knows it. I learned a word the other day and now I'm going to use it expecting everybody else to know that word when I just learned it three days ago. So I thank those of us who are in the industry for a while. It doesn't hurt to explain some of this stuff. And I have worked for vendors in my past and a fan of vendors, they do some great things, but they can also put some fear, uncertainty and doubt, the fud up there that doesn't always match. Nobody does relational databases anymore. Everything's no sequel. And I just, I actually get a little angry at that sometimes because that just doesn't make any sense. There's fit-for-purpose solutions, and there's things that relational databases are excellent at, and there's things they are terrible at. So I actually had built something similar to this for a client who was a sponsor, who was a business sponsor, and they literally were trying to get their heads around all these different systems and why you have so many different ones. So I just kind of modified it a bit to protect the innocent. But basically you have operational data that might be in your CRM system. You may have multiple CRM systems, which is different from any AP. You could have Internet of Things. This was a manufacturing company that had sensors on their product, which was sort of interesting, which is different than operational and reporting. And I could just stop there and talk about how much misinformation or misunderstanding is there. There's operational data, and there is reporting or summarization data in the warehouse, and they are very different. It's how many products did I buy, that purchase transaction, versus how many products did I sell last month by Region X, by sales, almost your typical Star Schema warehouse. And people will say, because warehouse is a nice word, people will say, oh yeah, it's in the warehouse. It's just a bucket of data they have. It's a database, but that's not a data warehouse. So I'm the type that always asks people to explain their assumptions, and I don't get invited back to parties, because what do we mean by that? But it's different than a data lake, and we'll talk about that. When we think of fear and uncertainty and doubt, I actually find it interesting where I really feel old, sort of the hype cycle of lake. It used to be everything was a lake, and now people say, oh, none of those data lakes were successful. Well, none of that is true either. You can have successful lakes, but it's not the same as a warehouse. A warehouse is structured and formatted a certain way at a lake, and we'll talk more about this as different use cases. Again, almost each one of these, which I find sort of funny, having been in the industry for so long, you know, Elastor Data Management, that doesn't work. Warehousing, that doesn't work. Data lakes, that doesn't work. I mean, nothing is ever, there's always good projects and bad parts. It doesn't work when you don't do it the right way, and you don't have an architecture behind it, and you don't have governance around it, or you're trying to use the wrong product for the wrong solution. So again, sometimes just calling these out and saying, what is the purpose? Is this a reference data set? Is it an operational reporting? Isn't just limited personal use? And is this spreadsheet on my laptop? And that's fine, because it's my wedding list. You know, that's okay. But if it is core customer data that I'm reporting to the street, probably should not be in the spreadsheet on my laptop, right? So none of these, in and of themselves, are good or bad technologies. It's the fit for purpose. The metadata, the piece on the right, core to any architecture. And again, there's a lot of types of metadata, which could be basic. If you want more about metadata, we had, it was March. We had a whole metadata webinar on that. But there's business metadata. And there's technical metadata. And there's a lot of confusion, even though that can be very simple. I actually had an argument with someone just this morning, a client I was speaking with, and he was saying, that's not metadata, that's a glossary. He went, well, it's a type of metadata. He was thinking metadata was data dictionary. We were both right. Those are both types of metadata. A data model has metadata in it. Data lineage is metadata. So there's a lot of flavors of that. But you can't, I would go out in the limn and say, you can't do any of the stuff on the left successfully without that piece on the right, which is the metadata. That's really the glue and the context that puts it together. So I found this interesting, and I'm stealing heavily from that architecture report that we did late last year, because it's just so relevant to this topic. But it is, what are people using today? And then the next slide, if you're curious, will be the one people are going to be using in the future, which I found interesting. But one of the things, because I do get a little, the fear and certainty and doubt when folks say that, you know, relational databases are dead. Nobody's using those anymore. Well, you'll see here from the list that everybody's still using them. And when you look at one of the most common technologies out there, it is relational databases. So it doesn't mean it's the only solution, and that will continue. But it is sort of that tried and true workhorse of the organization. That said, the other big leader is spreadsheets. And yes, that's not a database, but we put it there because that's probably the de facto leading database in the market because it's easy and it's ubiquitous, and people sort of use it for a lot of different things. It doesn't mean that's a great solution. But you'll see that the other sort of leaders are things like big data, but there's also still a lot of legacy cobalt, right? So anyone who still knows cobalt can probably make a good decent salary. That's not necessarily going away. But what I found interesting was what's going to be the future. So yes, of course, you could say that there's still a lot of cobalt out there and mainframe doesn't mean that's what we should be using in the future. When we look at what people are looking at in the future, if I am capable of moving my own slide. A couple of things I was interested about this. One is that relational databases still don't go away, but you'll see the difference is a lot of people are sort of moving those relational databases to the cloud, which can fit a lot of different use cases and can be great. You'll see that a lot of people are interested in big data and Hadoop type ecosystem platforms. I think especially as things like Internet of Things or a lot of the unstructured data, or you're trying to get third party data, weather data, social media, that kind of thing. That's a perfect example of a lake. But that doesn't mean data warehouse goes away. They work very nicely together. I think I have a friend in marketing and she always says that she's also a statistician and she says, you know, there's always sort of a 5% error in any survey. She did a survey of a brand new product that they made up just as a test case. And 5% of the people said they had been using it and really liked it. So she said there's always 5% of the people who didn't read the question or just didn't care or whatever. It's human error. So I thought this was interesting that about 5% on the third ball down sort of just said that they'll be using more legacy COBOL in the future. So either that is true and people really let COBOL or that's one of those errors or I misworded the question. But I found that interesting. You'll also see a lot of IoT, which from my experience, a lot of people are kind of going into that realm as well. But what I think I was most pleasantly surprised and pleased with is that when you looked at the previous, what are people using in the past? You see the spikes. It's relational and it's, well, I wish we could just take off the spreadsheet there. But it's sort of a lot of the same. But I think from seeing the diversity, it isn't necessarily yes. There's a big data. Everyone's sort of trying it to see what fits. You'll still see that relational. If you add those two together, it's definitely still clearly the leader. But there's a lot of other almost flattening. And because there's so many options, I see that as a good thing, that people are seeing a fit for purpose. And these can all be in an organization and work nicely together. And that's where an architecture fits in. There's not just the architecture of that database, but as that architecture of that database fits with your NoSQL, with your IoT stream, with your media files, and et cetera, et cetera. So also, just lay, a lot of people are uncertain. So when you look at the number of people so that they don't know. That's surprising. There's a lot of confusion up there. So hopefully this webinar will help clear up some of the clarity. And again, with all of these, you'll probably find a few things interested and a few things very basic. But there's always a different, broad range of people on the call. So we try to give a little, hopefully one thing will be obvious and one thing will be new to you. So starting off with big data, because that was one of the biggest, pun intended, answers there. It is a trend. And I think when we had some specific questions about that survey, over 70% of folks are either looking to use big data or using it now or planning to in the future. Not a surprise. Either a lot of the trends are that data science and discovery, kind of sandbox exploration, reporting in analytics to a certain extent. I think for me, my opinion, the maturity there isn't as strong as if you're going to use something like a warehouse. I know it's not mutually exclusive. There's things like hive technology and relational type technology you could put on a lake. But I would say most of what I've seen is kind of the benefits or that sandbox and the data science and discovery type because it's that diversity of data sources you can munch together. Some of the concerns are valid. You'll see some of the complexity, also the skills. One reason a lot of folks go to big data is that there's some cost savings when it comes to the platform. There's some of these relational big data vendors. It can get very expensive. But make sure to consider that with the skills required to manage it. So you're going to pay for something. You're going to pay for somebody who understands it or a vendor who understands it or a cloud solution that understands it or the software itself. That said, there can be cost savings for some of the massive volumes. But if you're really just looking to switch over your data warehouse to this, probably not. I would not recommend that necessarily because it's not necessarily the cost savings. You've got to get your nothing that, well, some people have massive warehouses. But generally that's not what people are using it for. It's more like, hey, here's my IOT or something of these large unstructured data sets. And security is still a concern, although a lot of the solutions are getting better in that. Historically, that wasn't their strength. But there are solutions out there to kind of manage that. So I, again, I found it interesting the hype cycle has sort of gone up and down so quickly with the data lakes. It went to, and I'm always surprised because I'm always sort of slow with trends. I'm not quite wearing bell bottoms, but sometimes I feel that way by the time I've figured out a trend it's over. But especially with data lakes, it went to this is the only thing you could possibly do. And this is awesome to, hey, these things never worked. Why are you using one? I think the truth obviously is in the middle somewhere. But I saw that in the kind of the mixed response that some people are using it with a warehouse, which I would think is a very valid use case where I had to pick. Some people aren't because they're concerned they're not effective. And there was also a large percentage of people saying, I really don't have a use case, which is probably very true. Don't just pick one because it's the cool thing. Do you really need it? What is your use case? As with any of these technologies we'll talk about. I sort of mentioned already this next one, but again foundational principle, the data warehouses that you like are very different things. I know with term they're kind of similar things. They're a bucket of stuff. Whereas you put things in a lake, you can put things in. But really when I use the term warehouse, I'm really thinking of I'm creating some analytic reports on historical data and doing your, whether it's a star schema or not, but kind of that slicing and dicing. I want sales by month, by region, by sales rep. That's what I think of as a warehouse, which is even different than an operational data store. It's just, it really is for that summer as a purpose. A data lake is more that idea of, I'm holding that disparate amount of data in this native format. It could be structured, semi-structured. And in a lot of the buzzword there is that schema on read. You don't necessarily know where and how you're going to use it always, but it is a great store for it. And that's where some talks about mixing is I want to do social media analysis and link it with my warehouse. That's a great use of the lake. And they can integrate together, as I mentioned. And again, they're not mutually exclusive. You could have your, I kind of put a lot of things in that piece on the right. That's more your enterprise system of record. That could be your data warehouse. These are the numbers I report to the street. I could have my master in reference data. I could have my master data for customer. I could have my reference data of region codes or whatever I'm referencing. As well as some local data markets, et cetera. But it doesn't obviate the need for a lake. So you might do sandbox exploration. You might, but a lot of those systems may want to use some of the master data for that. I want to do some exploration around some of our customer data. I can, as long as I handle PII correctly, take it off and do some things there. But there's also a feedback loop back to the warehouse. So I have several customers that are doing sort of exploratory analytics. And then you might find a variable. It's interesting. Maybe we should start tracking social media tags or interest or interest in a certain area and put that back in our warehouses as they feel. When these things work well, it's a collaboration between the things that are highly governed and highly modeled on your right with a system of record to the stuff that is not highly modeled in exploratory as long as you're doing that fit for purpose. That governance layer in the middle is really what makes that thing. We'll talk more about that. But you've got to govern the right thing in the right way. And when you have both, then you can have that nice reporting and analytics layer at the top. And I do see a lot of maturity in the industry there with tools and also usage. But there's also, again, some fud or free and tertany and doubt or I can make the pretty picture, but when I try to get the data right, the data is wrong. So that's why that layer at the bottom is really the core part. It's easy to make the pretty slice and dice graphs when you have a nice data set of nice, usable, conformed data below it. And this really gets into that governance piece or actually the architecture piece is that it's really now more than ever a data ecosystem. And governance is key to that. And this is one triangle I've used before. Kind of the message here, and I know it's messy, is you have a certain set of master data or reference data and you'll see it's the top of a pyramid. So it's almost by definition a smaller set of data. This might be my reference code list. This might be my product code master. These are my core products that I use worldwide. Let's all use the same information. Kind of a layer below that would be maybe things like your data warehouse. A little more volume, which is wider. But again, if I'm going to report some figures to the board or to the street or to ourselves, we want to make sure we're all calculating total sales the same way. And we're using reference data or master data. So that the product codes we're using to report sales are all the same. This part here, it should be pretty governed. It should be fairly structured and understood. As you get down to this functional and operational data area, this could be still in a relational database. You could still govern it. But it's a little more exploratory. It may be used by individual groups. So maybe this is the marketing team's data mark. And we don't have to have necessarily corporate-wide governance for it unless they're going to use data that everyone else is using. So a little bit of autonomy, leave people alone. It could be some exploration. Maybe they're doing some R analytics on some open data and they're just doing some stuff. Let them do their stuff. They're not hurting anybody else. They have the right level of governance that applies to them. And then there's this exploratory layer, which sort of by definition could be very broad. Maybe this is your streaming Internet of Things or it could be your call logs from the support center, et cetera, et cetera. It could be external weather data that we're seeing. How does that affect claims and insurance or whatever? So that is by nature exploratory. So the governance is less. And almost like the more it's used, I think of what I have at the top, the more it should be governed. And the more it's exploratory, leave people alone. But what I've seen a lot more folks do and it works really well is this sort of collaboration loop of how does it maybe, maybe you're finding something in this exploratory layer that really makes sense to start bringing into the fold and governing and bringing more closely. Also, there's certain master and reference data that people sort of want to check out and use in their exploration. I want to see how product sales are affected by weather. Let's use the product codes that are master data. Right. So it is an ecosystem that can be promoted up, could be promoted down, it could be reused, but you have to have some sort of governance and processes to really make that work and make that sing. The other thing we've talked about in an architecture is not only the technical but the people, not only the process around governance, but the literal human beings that are using it. And there is this rise of the self-service user where it might be a business analyst kind of person. It's not always someone with a computer science degree, but A, the tools have gotten so much better. And there's a lot of us in tech kind of get a little snobbish where the only people that can understand this stuff. It's just when it comes to data, data is owned by the business and tech. So yes, be the data advisor to explain what's the difference between a warehouse and a link. But sometimes the business people can have some really great insights. I had one client and she found her IT team to be rather dismissive than they were. I saw it in person. And she finally was so frustrated. She's like, I was on the math team, guys. I'm nerdy too. I'm just nerdy in a different way. And she was actually in marketing, but there's a lot of statistical analysis. So it's allowing those people to be their own type of nerdy, but also allow IT to do the things they're meant to do. And it's this, again, ecosystem for the self-service where this person is probably the first person in the world to say, yes, if you have master data that's right and accurate and a warehouse is correct, I would be the first person to sign up that that's the data I'm going to use. So make it good. Make it easy to get. Yes, please. There's metadata and glossaries and definitions that I would love to use. I'll even contribute to it if there's some nice collaborative tools. And I don't use that definition of sales, and I'll explain it to you why, right? But the value to them is they want to be able to integrate that with now their own analysis and model. And there's a new trend. This is, if you've heard about it's my next slide, but this sort of this top-down governance of this is what thou shalt use. And then there's a bottom-up governance of, hey, we're doing some cool stuff. Let's share it. And both have their place and both can be effective. Kind of the way I look at it is that top-down is kind of the encyclopedia where there's some old gray men in the room defining this is what a penguin is, and we're going to put it in the encyclopedia, and this is the forever truth of a penguin. So there's certain things where that's great, maybe not a penguin, but science is certain things that you want vetted and you want published and you want that. That's your master data. That's your reference data. But then there's something that I will be the verse to say deep down inside. If you've met me, you may be surprised, I am an old grouchy old man inside my own room. I say, yeah, it's never going to work. I can sometimes be the curmudgeon. So when things like Wikipedia come out, it's like, yeah, it's not going to work. It seems crazy. But I have to say I use it a lot. And it is that kind of eventual consistency that wisdom of the crowds. So yes, there can be errors, but there can be errors in the encyclopedia. The old gray men never talk to anybody and things get stale. So both have their place. And where I see this working really well is there is self-service data prep. People share query. You could say, thou shall calculate total sales this way. And everyone's like, yeah, whatever we do this. And you can do statistics and 90% of people are doing it in another way. Well, maybe you need to vet that, but at least you know. Or even better, when this really works, hey, this is how what we're doing. Want to share this query. And again, you can enforce top-down sharing or you can just say, hey, try this. And again, it's that mix of the two and using them in the right way. That can really work well. But part of that, and I thought this was an interesting question we asked was, and I was curious what the answer was going to be, who defined the data architecture? And this could scare you or make you hopeful depending on how you look at the world or how you look at this. No surprise. And it's almost self-defining. And this was a, you do as many questions, answers as you wish. So the data architect defining a data architecture is sort of a syllogism almost. That's like no surprise there. But it's the wide range of others that I think is good. And we didn't get into much more granularity here of how they work together. But if it's done correctly, this can work really well. The data architect would be the responsible for this, but they're also talking to the governance person. Sorry. Phone in the back. The governance person, the business stakeholders, the techie folks, the DBA as well, right? There's a lot of roles that should be involved in the data architecture. So I would necessarily say that the data governance person should define a data architecture, but they should be involved because that's going to affect them. And the key is the collaboration. Because there are a lot more people, the self-service user is using it, and you want to lock them out from the decision. So I found that rather interesting. Bit of a non-secretary, but it is an interesting piece of this idea of moving to the cloud, growing trends and some great things about it. I have some more and more companies I work with that are sort of that cloud first. And if there's not a better reason, we'll move to the cloud. Partly for the scalability, cost savings could be a reason for that. But it definitely is a growing trend. Definitely something to look at. We don't all need to have our own server to make things work. But it's not without its concern. So yes, I mean, cloud is just a machine somebody else is managing, right? One of my biggest, I'm getting way off track, but I'm known to do that. When I was a child and I grew up in Boston area, you could have cloudy days, and we had a place near the ocean, and I'm walking the beaches to sell foggy and nasty. I think my mom said, oh, we're in a cloud. I was like, oh my gosh, that's what a cloud is. And she was right. We were literally in a cloud. And I was thinking when you're a candidate, when you jump up in the air and clouds are bouncy, like fluffy things you could play in, and one day I'd go up on a plane and be able to bounce in the clouds. You could stick the same thing I hear so many people who aren't. They're like, let's put it in the cloud or magic. Cloud is really somebody else managing your data. So think of it that way. And so there are security and privacy concerns and SLAs and things like that. So yes, it can be great, especially for scalability. I had one client that retail, and they have a lot of more volume in December than any other time, and they're able to kind of scale up and scale down really quickly. So a lot of great reasons for that, but do think of it as not fluffy. Things up there you can jump up and down on. I know that was weird, but I had to share that. Okay, anyway. So master data management, just kind of going through the laundry list. Again, on that spirit of what's old is new again. Big years ago, that was going to be the be all end all. And I'm seeing a massive research and some master data management in terms of our practice. I would say almost strategy, governance, and master data are some of our big ones that people are asking for because it's needed, right? So just like everything else, data lakes, they fail, and data warehouse they fail, and master data, I mean, insert technology here, it's kind of fail, right? You know, falling down on my bicycle doesn't mean bicycles are bad technology. But I would say particularly with MDM, how you implement that is very, the technology can be complex, but often it's also the governance and the people and the process around it. So yes, it can fail, not done well, but do it done well can be immensely successful. And for those companies who want to be a data first company and have single view of product and single view of customer and single view, et cetera, you really just need master data management. So if you're not familiar with that, that really is what I just said, that single view of customer, that single view of product, but there's a lot of complexity and subtleties to that. Again, not the same thing as a warehouse. I do get that misconception a lot. I almost think of, you know, if you think of a warehouse and you have sales by customer, by product, the master data is almost your conformed dimensions to use a nerdy term, right? I want to have my dimension of customer can be fed from the master data system. You'll see here over 65% of the people want to do them. And that is another benefit of master data. It's not a static system. I'll talk more about that, but it can be fed. You have the master data, it can feed your operational, it can feed your warehouse. It really is that single source of truth that everybody should leverage. So the real benefit of master data is not only the opportunity of, you know, do we know who Stefan Krause is and we have his right age and his right address and that kind of thing, but almost this idea of, you know, really that 360 view of a customer. Stefan Krause is 31 years old. He lives in Pontresdina, Switzerland, but he's also purchased this amount in outdoor gear and he's done these purchases online. He likes to buy online instead of in the store. He likes to be communicated by text message, you know, or if we're going to external data, he won the NGD scheme marathon in 2015. All these kind of things you can find out, which is, again, a bit beyond what some people consider master data. And I would, I can live in both camps, right? There's the master data of, do I know that this Stefan Krause is different than Stephanie Krause's sister? And can I augment that? And maybe some of that's in a lake, right? But or a, and we'll talk a little bit more with graph databases. You can learn a lot about your customers, either through your source or through other sources. The scheme structure at St. Merritt. So like a lot of good information you can get about your customer, but you can't do that. One of my clients was a big insurance company. People like us, you know, high net worth individuals that had not only house insurance for all of their houses, but yacht insurance and company insurance for their corporations and all of that. And they did some really interesting sort of analytics on social media and a lot of these things. And they came back to link it and they didn't know if Stefan Krause was Stefan Krause the millionaire or Stefan Krause the scheme structure who makes no money. So they had a master data problem. You know, they were doing all the great analytics, but they couldn't link it back to who's our core customer. So now you need both of those to work together. Other piece that I mentioned when it comes to master data is that idea of process and governance. This is a customer we worked with that was a restaurant chain, one of my more fun clients I must say. Where basically they call this in to do governance and master data. I always like to look holistically and say, what's your business problem? So we actually, I felt like I was in the Mr. Rogers where he would go. I'm totally showing my age. He was like, today we're going to go to a pencil factory and see how pencils are made. So I literally went to the test kitchens. I talked to the chefs. We went to the supply chain folks and saw their buckets of eggs. We went to the restaurant and looked at their point of sale system. And what I found fascinating is that when we talked to marketing and we talked to the chefs, marketing had a data flow diagram on their wall. And they said, we can't get our menus right. We don't even know if the cheese on this menu is the same cheese that the product had put in the thing and it's going to affect our supply chain, which is going to affect costs. The chefs had had a picture to me that was a data model on their white board that said, we have all these ingredients and it's not stored in the center. So basically they needed a master list of their ingredients and their recipes and their menus. And we basically did this massive process flow diagram from even with pictures like this. And we filled it up to the CEO and she got it. She's like, okay, I didn't understand master data, but when you showed me this flow, that the chef makes a recipe and then it's sent to marketing to name and then it's sent to supply chain to cost and then a customer buys it and if you have the wrong cost, it's going to lose this money. That I get because that's my business. So again, it was a difficult technical challenge because if any of you are trying to integrate any of these systems and know which back ends I'm probably talking about, yeah, that was hard. Harder part was getting that story together and getting a governance team together. We actually added governance to their product launch. So they had a team that anytime you lost a product you could kind of bring up issues and we added data to that piece. So again, the hard part was getting the business value there. So a little more transaction master data not that this is necessarily a one-on-one, but they are different things. Back to that fit for purpose type conversation that transaction, that's going to be the fact that Stefan Kraus bought a Scarpa Telemark ski boot. I know what my hobbies are here. Hold this closer. At a certain price, he bought it in St. Marist, Switzerland. I bought the same ski boot in Boulder, Colorado. What price, what date, what time. But the master data of that, that's your core relational transactional master. The master data is going to be, do we have a single view of customer that Donna Burbank has a certain zip code and AIDS and all that sort of thing. Are my state codes, do I know that CO is a state code and CH is a country code for Switzerland? Do I have euros versus dollars? These are all kind of that core master data, which is very different than your transactional list. Could be one-on-one to some of you, but I found this helpful the first time someone showed it to me. So hopefully it's helpful to you. So again, not these folks that love them, not the relational databases. They are really good for what they are good for. So if you're trying to manage your transactional data, they are excellent for that. And there's certain things that other systems don't do and think of that. So one of the great things of relational algebra is this idea of kind of the referential integrity and normalization to get totally nerdy. You can ask in transactions so that if I want to reverse this purchase, my ski boots didn't fit, you can do that, right? Or that I want to know that this product is linked to this product. It does all of that really well. If people know SQL and you want to query it, it can do that really well. So for that type of thing, relational database any day, good pros and cons and all that sort of thing, but that's sort of what they were built for. Slightly different than master data, which can be very done in a relational database. There's also master data systems now done with graph, which we'll talk a little bit about. But again, this is a little different than a warehouse or anything else. That's really thinking of it as your golden record, which is also different than your reference data. So I could have done a Burbank. Could it be my golden record? I could have reference data around that, like the fact that I'm in Colorado, which is CO, which is different than MA, where I was born in Massachusetts, all of that sort of thing. But all of these systems have information about me, the CRM system, the fact that I bought stuff in the store, the fact that I paid for it, it went through finance, the fact that I clicked on an ad in marketing, et cetera, et cetera, et cetera. Then to get into the warehouse, you have to do all that matching and merging. Did I type in D-O-N-A instead of D-O-N-N-A? Did I not put my address one time? All that matching and merging and cleansing to know that that's on a Burbank is me. And then it might be used in a news application when someone's trying to look up me to sell me stuff, or they could say how many products did Donna bought, how many ski boots did Donna buy last month? And so that's where you want to model and have a warehouse and summarize, which might be showing your BI report. So that's one of the nice things about MDM, you can get it, link it to the source, get that golden record, publish it back out, you can put it in the warehouse and summarize it. On that note of graph databases, some graph companies are kind of using that idea of kind of an enterprise knowledge graph, or recommendation, and we can back to Stefan and not me, but he bought a downhill ski boot, and then because you can kind of do that graph link, so they're going to have a graph, it's just the relationships between points that thing related to thing. And we have a whole webinar on that if you're more interested, I think it was last month. But you could say, oh, you bought a downhill Scarpa ski boot, and I see you won the Anglican ski marathon, you might want to buy a cross-country ski boot that's very similar and in your size and on sale. So some of those things, again, we're still talking about a customer instead of a business use, but a very different technology, but they can augment all of the other things. Again, it's not going to sort of replace your relational databases, but a great augmentation. It could be fraud detection, it could be recommendation engines, linking social networks together, the fact that I know Stefan and we ski together all the time and buying similar products, or whatever, right? No sequel, again, we've got a whole session on this, I think maybe we do this year, but one use case when we're thinking of customers could be just think of what, if I'm thinking of say a key value pair, terrible for metadata, terrible for saying how many products did he buy last year and get metadata on it, but what's really great for is things like he's searching through the Scarpa website. I'm totally giving this, they should give me a commission. So you could be looking at ski boots and then go to rock climbing boots and then he went to the size and the size is a different blah, blah, blah, right? So that web session long, user preference, profile, things like that, excellent, first come of that, ski, no sequel type information. Data models near and dear to my heart, right? So for relational databases, we get a whole session on that, these are perfect for that, but even when we're in the no sequel world or the graph world or whatever, there is benefit, even just at a business layer. I'm doing this right now for a major manufacturing company that's on all continents and they're massive, and for them just doing something like what big buckets of data we have? We have customer and we have product and we have invoice and we have schedule and what I'm doing with them literally right now is doing a process flow, I mean a data flow and saying your lake has this and yours no sequel has this and just mapping these big subject areas to what technologies you're using can be amazing even when you think of anything at volume. So I'm always a fan of the big high level conceptual model. I've never seen a business person not like them. No, I'd won. I hated it. But in general, people like this because it's that high level schematic of the data that then you can overlay on to different things which leads me into my following slides. One is just that data modeling is cool and everybody's doing it and it's the sexiest job of the 21st century backed up by data. Anyway, over 96% of the respondents which I know are diversity kind of people so by definition of course they do modeling because they're smart but I hear again a lot of fear and uncertainty that no one does data modeling anymore. This is crazy talk. Yes, you might not need to upfront model with no sequel. That doesn't mean that we're not doing enterprise data modeling or it doesn't make sense for a relational database rant over. Want more ranting? Call me up. I can do a rant without for a long time. But the other benefit of doing sort of this high level business kind of data model is the relationship to other things. So yes, this if I were just doing a relational database and this is all I did it may be too simplistic and silly but the key is especially with complexity simplification of that complexity. So one thing I'm a fan of especially with things like MDM or process models but it could be anything it could be governance it could be data quality and showing these swim lanes and I have sold governance up to business folks and data quality tool everything by showing them that when I do product development showing them the flow of their data and where it's affected working with a customer they'll talk about in a bit of just showing all their contact data and where it is along the process and who updates it and who does all that. So showing the read this is the so what factor right. So when do we use these different pieces actually this was a anonymized version of that recipe right. So we're doing the product development we know certain product components and then we do costing of that product and then we market it blah, blah, blah. And it really shows the interaction between people really helpful with governance when you think of your swim lanes these could be your data owners and your data stewards right. What is the supply chain responsible for and then how do they interact. Another thing that I've done is when you think it is almost core enterprise architecture which relates to data architecture a business capability model and again this could be overkill for a small company but for large companies if we're just thinking customer data who uses that so product management uses that sales uses that marketing uses that you know a lot of other folks in the organization can use that as well. So why would that be important one of the big companies we work for with this massive insurance company that was on both sides of the pond in Europe and the U.S. and one of the main reasons for their merger from the CEO in their merger call was it was a share think of insurance that almost is driven by data they are just basically doing research on data for cost and risk and all that sort of thing but just trying to get that sense of those combined data assets was just massive effort so if you've ever been involved in a merger you'll see one of the redundancies we have two HR departments now we have two sales departments and two whatever so a lot of it was going through their data estate and we created some business capability models and just did that data overlay so do we have similar departments we did that that was more of an enterprise architecture and then we did the data overlay and then we did some of these other data model process models data flow but just starting here and saying who's using what and what types of data was just that initial roadmap that was really helpful which was that picture so similar to that is I think I've talked about before this idea of a crud matrix where is data created, updated, read and deleted, old-fashioned thing or foundational thing depending on how you want to look at but it's so incredibly helpful to see this again reuse between areas and processes and things like that so I'm always a fan of that and that can have a lot of highlight of valid this is why data quality is a problem because product development creates it and somebody else updates it and they shouldn't be something like that. Process models and crud fit really well together so I receive an order I process that who's doing what where when and how super important to things like governance and data quality and things like that and I am a big fan of and I will fight you tooth and nail if you disagree how that architecture can be seen as old and slow and not necessary but I think when you do it well really does help create these quick wins and also a case study really quickly of where we just did this of you pick the right use case and actually I'll jump to it now in the interest of time this was a big retail company we work with and they were growing quickly they were not agile in the capital agile framework but just they did everything fast and we get to sell more now that said they realized that their data was causing them difficulty in doing things so given that environment where it's privately owned fast pace we're going to make money now but what's the minimum viable product we can do just to like fix some data quality I was in that environment we started out with a process flow diagram a crud matrix data flow diagram and we just picked one story and said you know you can't even get your email address right from when a customer looks at your product to when they buy it to when they got a first support and kind of just had some call outs of what issues that was going to cause loved it right so we actually brought that to the head of marketing and she was great because you know I never thought I would have used a word data flow diagram but I get it that's why we can't serve our customer and so actually they ended up being one of the most architecturally advanced customers I use but they did it in small little chunks they highlighted the problem with just enough they didn't do an enterprise data model they did a data model of contact they did a process model just for this one piece and it was amazingly beneficial and that's one of the reasons I'm a fan of these models because it's that I've said this before I'll reuse my own joke that faster time to light bulb you highlight a few things on a model I literally do call outs like that of could we link these things together oh oh there's a problem in the product process or whatever so big fan of the architectural components so again successful architecture is using that fit for purpose right tool for the right job collaborating between different roles and a big part of that is some even with all this new hot technology is some of that core fundamentals and doing that right really helps get your right business wins so I'll open it up for questions because I know I kind of talked long I'm sure Shannon will send out the link to this this white paper at the end and please join us next month when we're going to talk about artificial intelligence thank you Donna so much for another great presentation and yes I will include a link to the to research paper as well and you can find it on dataversity.net I'm just answering the most commonly asked questions in addition to sending a link to the research paper I will send a link to all registrants to the recording and the slides from this presentation today and just to let everybody know and to remind everybody to see you as holiday on Monday so that email may not go out until end of day Tuesday so just FYI there. So Donna diving right into the questions here I didn't see in-memory data grid on the list of data stores is this platform part of the data architecture? Yes that would definitely be part of a data architecture yeah we could add that next year but in memory can be definitely AP so you just want to have that right for purpose use case but yeah definitely a new technology to look or not even necessarily new but a technology to look at yes. Everyone is being quiet today not a ton of questions so let me go through the chat here really quick see if there's anything additional that I can pull out. I see one while you're looking because that comes up a lot. Oh it was actually the Pavel but I already warned him I'd answer his questions for him but this comes up a lot does it have to always be physical integration or could it be virtual? This is not answering for Clover et al but just in terms of an architecture I think both have their valid place but I think starting with that high level art that physically the drawing of the architecture can help with that you can't do virtual without having that map and then it makes sense you don't always have to move something into a warehouse to integrate it then always have to be physically integrated as long as there is sort of planned virtual or use one of the virtualization tools as well. All right and where would I go for more depth in depth on a few of these slides do you have a book? Well feel free to drop me a note if you have a specific question I do have two books one is on high level data modeling and one is on detail physical data modeling conceptual logical physical none on this particular data integration I would also present you to some of these either past or future where really for example graph we have a whole going in depth there or if you have a specific question just drop me a note. Yes and as Donna said they are all available on demand on the website so they probably will contain a link to the on demand recordings from the past. Let me see here we have got everyone is kind of slow today in terms of submitting questions oh but we are right at the top of the air so actually we are out of time. So into it I'm like I'm ready to ask more questions but that does bring us to the top of the hour. Donna thank you so much again as always for another great presentation. Thanks to our attendees for being so engaged in everything we do. I hope everyone has a great day again reminder I will send a follow-up email by end of day Tuesday for this webinar because there is a holiday U.S. holiday on Monday with links to the slides recording and everything else in question here so thanks all hope you have a great day. Thank you. Bye.