 and here we go. Hello and welcome. My name is Shannon Kemp and I'm the Chief Digital Manager for Data Diversity. We want to thank you for joining the latest in the monthly webinar series Data Architecture Strategies with Donna Burbank. Today Donna will talk about cloud-based data warehousing, what's new and what stays the same. Just a couple of points to get us started due to a large number of people that attend these sessions. You will be muted during the webinar and we very much encourage you to chat with us and with each other throughout the webinar. To do so just click the chat icon in the bottom middle of your screen to activate that feature. For questions you will be collecting them by the Q&A section in the bottom right hand corner of your screen or if you like to tweet we encourage you to share highlights or questions via Twitter using hashtag DA Strategies. And if you'd like to continue the networking and conversation after the webinar and to learn more about Donna just go to community.dativersity.net. As always we will send a follow-up email within two business days containing links to the recording of this session and additional information requested throughout the webinar. Now let me introduce to you our speaker for today Donna Burbank. Donna is a recognized industry expert in information management with over 20 years of experience helping organizations enrich their business opportunities through data and information. She currently is the managing director of Global Data Strategies Limited where she assists organizations around the globe in driving value from their data. She has worked with dozens of Fortune 500 companies worldwide in the Americas, Europe, Asia and Africa and speaks regularly at industry conferences. And with that let me give the floor to Donna to get today's webinar started. Hello and welcome. Hi Shannon it's always a pleasure to join you guys on these webinars and thanks to everyone from joining probably from home on these days. So to kick off as you know this is a monthly webinar and many of you we love to see the familiar names on the on the list have joined us from previous ones. If you have not and any of those from previous recording or even from previous years are of interest to you as Shannon will mention at the end and in her follow-up these are all on demand for recording so you can always catch either this one or previous webinars if you wish and of course we would love to have you join any of those others in the list and this monthly series if any of them are of interest to you. But today the hot topic is cloud-based data warehousing and what's new there's a lot new in the industry but then you know what does stay the same and that may be top of mind to a lot of you who may have been in the industry for a while and are seeing this new technology you know come on board and what does that mean. So you know what's nice about that is that data warehousing despite the myth of its demise it still has a strong place in today's organization it's not going anywhere anytime I see one could say that it's been around for decades therefore it's outdated or one could say it's been around for decades therefore it's a very strong foundation that clearly works right. Yet things are changing so how do things like cloud change the world of data warehousing how do things like data links streaming data all these new technologies work so what's changed what stayed the same and then what really started to think about as you look at these. So data diversity and I we do a yearly webinar a webinar I can talk a yearly survey on business data management and these are some of the highlights you'll see from that survey and what's nice is one of the top we ask sort of what are your drivers for your organization and some of the key drivers you'll see relate directly to reporting and analytics so over 80% when they asked what are your drivers for data management in general could have been anything could have been operational excellence could have been the digital business etc and those did come out the absolute number one when an idea of reporting and analytics we shouldn't be surprised 87% of those were using both bi and data warehouse which is nice to see and you might think they go together but unfortunately I still see a lot of clients that bring us in and they may have a great front end on the bi layer but may not have a proper sort of data warehouse and in the back and you can see a lot of these new modern tools can let you do really fancy things from a spreadsheet or from flat files but as you probably know if you're on this call that's probably not the best way to scale the enterprise so that was heartening to see that kind of equal measure of data warehousing and bi data lake I know that's been a hot topic for the past few years we we should have asked for a data lake adoption and whether people were using it on its own or in conjunction with a warehouse and then you'll see that the majority of folks who do have a data lake in this particular survey use that in conjunction with a warehouse which is what I like to see because I think they both have their place and it's not an either or and we will talk about that in this presentation so as I mentioned so one of the top drivers when we do look at business drivers was this idea of gaining insight through reporting and analytics it does not mean that some of these other things are not important as well so we look at saving costs and increasing efficiency or reducing risk you'll see some of the other big ones there you know digital transformation customers interest etc but when we look to move to the cloud and we're looking to do things like saving cost and increasing efficiency that is a very valid driver for folks trying to look at some of this platform scalability and so I see those as sort of joined nicely together when we look at reducing risk and we'll show in kind of some surveys coming up that is a concern and there's pros and cons to the cloud but some folks are still sort of on the fence with that especially if they have sort of protected data or you know secret data that they're a little nervous about and maybe that's valid and maybe it's not I will jury's out on that but those are definitely kind of I don't want to say competing but corollary of drivers so when you look at a lot of talk of cloud but how much is actually in use who's starting to look at it and then in terms of going to the cloud cloud is just the platform right so you could have a relational database on cloud you could have a column store database on the cloud you could have a bunch of flat files if you think it was going to have an AWS bucket you could just be storing you know streaming data files and dumping them there so it's sort of a a lot of folks just sort of say you know we have a cloud first or a moving to the cloud but as we know there's a lot of nuance to that so clearly still the the leading technology out there especially for data warehouse and where it makes a lot of sense is this idea of a relational database but still today on prem seems to have a higher ranking and we've done as I said the survey for the past few years and that has been the case for the past two or three years as we've run the survey that relational is still kind of the the go-to what people are currently using but when you look to the future and what people are planning to use you'll see that not only is relational still high which is fine especially when we're talking about data warehousing and there are other options and we'll talk about that but you'll see that cloud is definitely sort of edging out over relational but not by much right it is and there's no declare winning horse um so again this pros and cons it doesn't even have to be either or you can have sort of a hybrid approach of in this if you notice in the question it was select all that apply right so you don't have to move anything to the cloud it could be your sandbox data goes to the cloud or at center or it could be different by different subject theories you don't have to do one size fits all which again if you've been on these calls before you might have heard me talk to this slide in a different perspective but it isn't just relational databases and you're seeing that because there's so much new technology people are looking at things like graph databases early not no sequel databases etc i think that's great as well because there is no one size fits all there's so many different use cases and it's just taking the right tool for the right job that said relational is still a good tool for the job it does but there's plenty of other options out there now um so when we go to specifically and this is again from that same survey that you can get from data diversity or our website as well um what are the pros and cons and i found this particularly interesting um there there used to be uh it was one of the big banks and i spent a lot i used to spend a lot of time in airports until this uh latest trend we have now um but it was sort of like what is heaven and what is hell and that have sort of the same picture someone's staying in a tent right and for someone who loves camping that's heaven and someone who hates camping that's hell with a cruise ship right one spouse thinks that's heaven one spouse thinks that's hell and the same way cloud had some of the similar answers so when they say one way the question was what are your reasons for moving to the cloud i can talk today moving to the cloud some folks who they're moving to the cloud because it's cost lower cost and some people were concerned about moving to the cloud because the costs were higher right so it sort of depends on your perspective and that's valid we'll talk about that it isn't necessarily always cheaper it's sort of the decisions you want to rent or buy a house it really there's no one answer depends on your what you how you're you know are you going to be moving soon what's your age what's your income all of that um the other one that sort of had that same sort of you know is it heaven or is it hell it is that idea of performance some folks who wanted to move to the cloud because they had better performance and some pop people want to move to the cloud or afraid to move to the cloud because they had lower performance so again i think too there might be a bit of a misnomer because is it really the cloud that offers the performance or is it how you're architecting on that cloud right and we'll talk more about that that there's again the cloud is the platform and it offers a lot of new technologies but you don't necessarily just get lift and shift into the same things you were doing on prem and expect to do that in the cloud both for performance and because there's new tools in the cloud also for cost so i'll talk a little more about this sprinkled throughout the presentation but that is a bit of a culture shift for folks that you know think of the old days or current days right i i have a sequel server and i spin up the box and and i sort of do things with it with the cloud instead of being paid you know you're charging by usage and performance and and so i had one client sort of by accident you know they sort of opened up uses the cloud and the beauty of these is that it really is easy to scale up and spin up an instance the entire dev team was doing that sort of forgot to shut them off um was just sort of you know thinking of it in the sandbox they got a million u.s dollar bill a few months later uh they were six months later and had a massive sticker and to be fair the developer you know there was a bad policy to allow that developers really hadn't even thought of that okay so that was definitely somewhere uh where it was higher cost not because of the you know the way the cloud was architected it was the way they were they were using it right so the ones that did sort of were different were the the second one there whether it's better scalability moving to the cloud i would agree with that not only in terms of the ability to start small i mean i i do that myself um one of the beauty of these cloud platforms is that anybody in their pajamas and that might be several of you set up your camera just kidding um is that you can spin up some amazingly complex systems literally from your kitchen table um on on some of these platforms they do at ws and azurn and google right um so you can start small and then grow or if you're a small organization and you want to scale the other thing we see um some of our clients is you know yearly season abilities season uh today is a great day from my talking um you know maybe over the holiday season uh you have more you know web traffic or more sales so you want to be able to scale during the year as well it isn't something all or nothing um and then the kind in the concerns that idea of what losing security and privacy um again there's slas you know you can just you know some of your your trust is in the vendor you choose um but do remember and we like to talk about the cloud the cloud is just somebody else's machine it is not this mythical thing and so think of that anything you're sending over the cloud is this would be could potentially have that risk there so it's not an interesting kind of pros and cons that folks are considering um the other thing to think about is this idea of platform availability and uptime in slas and again with all of this there is no one stop shopping answer so sorry if you were looking for the easy answer nothing's easy in tech but it's good to think of the pros and cons you know also want to over complicate it but think of the the downside risk and it's also an upside um is that you are no longer owning your own servers and you're no longer responsible for the downtime so and if you've made it by a dba and had to come in at midnight on a saturday to get the server back up maybe that's the best thing you've ever heard right not your problem um but this tweet um there from azure um i actually had retweeted this because one of my massive um massively large companies in in latin america had their entire system running on this azure platform and it went down and they had some serious operational issues and so this was top of mind and which is why it was top of mind in my tweet stream um and yes there there's contingencies they could move to a different area etc um and that doesn't happen that often um but if that would make you nervous i mean i'm sometimes the type of person i'd rather make my own mistake than have somebody else make a mistake because it's out of your control um the other thing is that some organizations have a multi-cloud platform to reduce risk uh doesn't necessarily have to be and i i don't generally like to use product names but i think we also have know that least the big three the big four out there so um for example you could use azure and aws for different some of your use changes or have backups in one or the other uh that is how some people are handling it the other nice thing about these platforms is that you can do regional scalability so you could have backups in different regions you could pick regions um for performance reasons you could pick reasons for compliance so think of gdpr or or areas where you want to keep your data within a certain geographical region you can this is um a screenshot from azure and you can see they actually do provide a fair amount of information i know in my dba is stays i if i had a nice website like this from my dba i would have been thrilled i never saw something like this well i i was being a user i wasn't with dba um so they do have you know they're fairly open was a lot of the information and that's a pro as well um the other thing and when one thinks about it you can think of is there is there risk or can they handle it and this was a quote just a couple months ago from one of my clients um an auto manufacturing company and we were sort of thinking of the cloud versus not cloud and he was that kind of matter of fact blunt kind of guy and he said you know if amazon google or microsoft can't handle it you think we're going to handle it better seriously like we build cars we don't build software and that is one of the ideas of the cloud outsource things you're not experts in so you are you know insert your industry here you're an education organization you're a nonprofit you're a bank you're a retail company very few of you on the call are probably software companies i'm sure some of you are we always have a few um so do you want to be in the business of maintaining your servers the answer very well may be yes right i used to work at a top secret facility we're absolutely we had our servers in the room that nobody else could go in and all of that right um but if you're not that and you just want to scale up a data lake to do some testing by all means use this and there's everything in between so i thought this was a decent slide to kind of look at some of the different aspects of this idea of platform availability definitely positives there's definitely some risk as well um and there's certain ways to mitigate that risk like things like multi-platform uh the mix of some on-prem and some of the cloud doesn't have to be on so when we look at some of the benefits to moving to more of a cloud-based data warehouse one of them is the one i touched on is the ease of entry um so you know in the on-prem world you literally have to buy your server you need to have to have it set up you either have to wait for somebody to do that and give you access or learn how to do it yourself and that is a realistic daunting task it's also a cost um so again you can sit in your kitchen by your kitchen table and then spin up an aws instance very quickly obviously the cost um but that is but there's also a learning curve so as again you may know on-prem and not um not cloud as much but there's a lot of those skills do transfer also when we look at the increase focus on analytics so the topic of this is data warehouse when we think of data warehouse well a lot of us think of kind of the traditional i'm doing my financial reporting or my sales reporting i want to know sales by region by sales rep by product by all of that and then often that is in structured relational type databases but more and more because of these cloud platforms to a great extent there there's just the availability of so much more data and non-structured data or even the products themselves you know think of if your product is fit but you could exactly know how people are using your product and when and what customer sentiment is etc um and you probably can't store that very easily on your sql server instance on prem right you probably want something with with greater scalability and you want them to be able to mix together it isn't either or i want to know sales by region and i want to know usage uh by my customers in there and what the sentiment is and are they tweeting about my product and so that's when it starts to meld and so as more people are moving away from just i want to say just because it's still super important but more your traditional descriptive analytics um to more prescriptive and and you know um predictive models and things you do want that flexibility of having kind of scalability which leads to that third one which is similar um that volume and velocity of variety of data not only more data or the ability to get more data but variety of data could be sensor streaming data it could be voice stability and and so many of the customers i work with it you know you might not even think of that company as someone who might need big data but even something like support call logs can you can you stream that text of the support calls and do some voice analytics on that um and cost savings often that is why folks go to the cloud um it could be that the fact that it's low cost of entry but then there's that ability to scale um you know similar to you know i'm going to sell you a car and it's only $9.99 a day um but you're going to be paying that $9.99 for the rest of your life that's probably going to be more expensive over time rather than just buying it up front um so that's kind of that idea of your op-ex versus capex and sometimes and that's something to check with finance some companies want one or the other and that helps make that decision um and i already mentioned that flex usage so think of your seasonal variability maybe it isn't something you use all the time um and you want to have that flexibility um and i mentioned that already that cloud does not always mean lower costs consider your usage passage patterns in your practices before making that jump and i've already mentioned as well that democratization of data but that is a way um that these are growing is that it really is easy to quote spin up some of these new instances you don't necessarily have to be a platform expert but we'll talk later that's also the downside right um well i told the story too many times but you know i when i bought my house it was a bit of a fixer-upper and i had grand visions of doing everything myself um and after spending a weekend uh trying to put up a wall and i had a the contractor come in and he literally did it in an hour when i was on a conference call i realized that was not my strength and yes it was easy to go buy the the lumber from plumb depot it wasn't i did not and so i have the skills so it's something with data right there's certain best practices that we know is the management professional uh that make make it more effective so uh easy to be an easy expert but you can also get yourself into trouble and that can also lead into uh some of these cost savings um is that when the data warehouse is is performance and tuned properly to be in the cloud yes it can be cost savings but we've we've seen some of our clients again seems easy i can spin it up and put data in but are you maximizing the way that data is being optimized or are you really doing expensive things and it's fast enough do i do i need to really model the data like i used to it still works you know you're spending a lot maybe you're wasting cycle and yeah it works but this probably is still a better way um so um i i touched on this as well is that you know in in the day or was there every day when we only wanted to do descriptive analytics i sold uh this much product last month tends to be sort of hindsight um tends to be more descriptive and then just to give credit workers do this is the gardeners um kind of graph on this of kind of the evolution of analytics as that you sort of get past that descriptive you might want to say okay sales dropped last month but then why why did it happen what am i diagnostic analytics and then even better well what's going to happen next month will they drop again so how do i look into more predictive analytics and then prescriptive how can i make that happen i know a lot of my company companies i'm working with are really looking at that in terms of just think of next spec best action so you can take things that look in the past and said i know that based on these three points or the customer right break bring up i should do x or um you know if if the uh manufacturing plant does this i should do y you know and just kind of really understanding based on past history what you should do next and that really gets into the benefit of some of your analytics but in order to do that you do need some of those volumes of history and variety and that's where a lot of folks are looking to the more that data lake style implementation where you can get that volume and variety of data to really do that type of analytics um so crack myself open this picture i thought it was cute so when you think of the data warehouse versus the data lake um here's the same guy different clothes in a way when you think of the traditional data lake it is a bit more casual right you can do more exploration a lot of folks use things like sandbox analytics not that you can't have um data lakes and production but it tends to have been a looser environment partly because you can spend these things up easily and i've seen clients worse of it and sales didn't get along or sales and marketing didn't get along and marketing went and bought their own lake and did it themselves they were you know techie enough to be able to do that um so it is sort of a looser way of working kind of that more exploratory silicon valley startup kind of attitude when you think of the traditional data warehouse it is more structured and and it tends to be more the guy in the suit for financial reporting and and yes it does take a bit longer because i have to know that if i'm reporting to the street our sales last quarter they have to be right and i have to have the lineage for that so it's not necessarily um that one is better than the other but they are very different things so you'll see that's kind of the or condition there right you either have a data lake and it has its pros and cons for that kind of sandbox analytics and then you have your traditional data warehouse which is more strict and more formal for things like financial reporting but in the new world as we go to some of these data platforms wait for it it can be best of both worlds i was very excited to find these pictures um whereas more of your x or um condition right so that is the the mullet is the formal up front and sort and i got it backwards you know short in the back um so as we get to these these cloud worlds um that that merging of what's the data lake and what's the data warehouse and what's the staging area and what's the lake and what does it have to be just relational or kind of kind of mixed methods um it is is really where some of these platforms are heading which is pretty exciting and that's i call that more of an integrated data science platform where yes it's the best of both worlds in some ways if you know how to optimize them um is that you can do your traditional and more relational style of data warehousing but you also have the scalability and the flexibility to have different kind of data sets so both the suit and the jeans together um because and and i i i from the beginning was not i'm i'm sitting on my porch saying get off my lawn with this one because that was all fashioned before it was hip of when data lakes came out and i think there was a lot of vendor hype of we don't need data warehouses anymore you just kind of put stuff in this place and magic happens right and i think a lot of us who knew that technology were a bit skeptical of that and again not that data links are bad things it was the it's all or nothing you know it's the fact that these are the new nirvana um so i think for that there's some disillusionment so you'll see here again courting garner uh they have their hype hype cycle for different technologies and data lakes are a bit of trough of disillusionment but i think they're the first to say often this disillusionment happens because things are overhyped it doesn't mean if you look at some of the things um that are kind of in that plateau productivity data quality tools are fine things right um it's just the problem in the day those were hyped as well so again doesn't mean data lakes are bad things i think is coming back into reality you'll see sort of the trough of disillusionment is some of the hadoop distributions and i'm seeing that as well i think a lot of folks are um when they think of sort of big data it is more definitely not an on-prem type solution but maybe also kind of some of these cloud solutions that are less hadoop focused um and kind of are some other solutions out there so found that interesting had my i told you so sunny moment because i can sometimes be a curmudgeon but when i am a curmudgeon i tend to be right i pick my curmudgeonness carefully um so i think this is a more realistic approach of that and if you remember back to the survey in the beginning of how are people using data lakes and data warehouses that um if you think of the green on the left right gosh i'm having a day um that the data warehouse is more of your enterprise system of record and or data march and we can argue the difference or similarities there um also things like your master and reference data where yes by design they are modeled uh they are structured and they are validated and you want them to be again right you're you're reporting your financials to the street you want them to be right i want a single master view of all the physicians who are certified to do surgery in my hospital yes i think i want that to be accurate and mastered and correct um and then the data lake is more of your discovery and sandbox um which tends to be data exploration or lightly modeled data um but again especially as we move forward in some of these platforms i'll talk about it doesn't have to be and i know um you know or early on uh as as folks were starting to move the concept of lake into more production level things you know i we often think of host the sandbox but think of i'm a manufacturing plant i'm doing internet of things streaming sensor data from my from my manufacturing machinery that's not sandbox that's my business i'm running off that so if the sensors say something's down i need to make that next big action business decision off it so i don't want to overstate that you know it's sandboxing but often the driver has been around analytics um but regardless of whether it's a lake or uh the enterprise system of record you still have security and privacy and you still should have governance and collaboration and you should be able to report across all of those but historically those have been kind of separate environments um and we'll talk later about governance and how that might be different on each environment but you still need it as was security and privacy and and that's the trend you often don't have a customer ask things like well would you see document management the same as data management and should that be part of the governance council and i usually have a snarky answer with something like well if someone steals your credit card information and then later you go to your credit card company and they say well it wasn't a pdf it wasn't in a database you still stole my credit card um that's sort of the great example of yeah it's the type of data not necessarily the format of it um and security and the governance should go across all um and when i bring up negative appliance examples i'm generally uh anonymized to protect the innocent but i'll use this person's first name i remember the massive uh financial institutions in the new york area and it was a younger person they were talking about their pii policy or their you know personally identifiable information or pc i with their financial and and very earnestly he's a younger younger gentleman he raises his hand he's like so i shouldn't have been putting the credit card fees down on that exploratory data lake and he's lost very quickly he's a pretty full bucket of work um and he literally didn't know that you really shouldn't have just been taking live pc i did and throwing it out in the lake to kind of do some analytics on um he quickly find out um but that's really again a risk of especially as these things become easier to spin up you want to make sure that all of these kind of citizen deep scientists have the same um idea of the day of governance and security as is needed so but as we go into this more modern data warehouse i think the opportunities are exciting and this is where the the half jeans half business suits sort of come in and and before i go into these examples look at the lower left i want to just completely caveat these are just examples and data warehouses can be done in different ways this pros and cons to each as is with the cloud but these are just two examples of of some semi popular ways of doing it but i think they're indicative of the story we're going to tell so in the traditional data warehouse you probably familiar with this type of model again there's some different flavors but you have your source system and you generally i hope have a data model for that and hopefully a glossary to understand what that data means you will probably want to stage that into some sort of landing area or staging area whatever you're going to call that um probably in its source form uh you probably want some model of that and then at some point within the warehouse and whether it's in their kimball or however you want to do that you have a model for it you generally do some sort of etl because how you want the or extract transform and load so because how you it lives in the source is probably not how you want it in the warehouse again you want to start to massage that and normalize it or star scheme it or whatever so you can start to slice and dice it for your business needs again you'll have models there whether they're relational or dimensional you'll want things like business glossaries often there's some sort of idea of a cube that can not only help you slice and dice easily but also kind of have that business semantic layer so what does tbl underscore ct mean i mean that's the count of you know tables or whatever but it puts kind of a business layer onto that you can also get some of the definitions there and then the idea of kind of the reporting and the dashboards on top of that with maybe in a um the itool for example if you look at kind of the cloud data warehouse um there's just more options you can still do what we described above still it's still a very valid model i have several customers doing that now and it works that's great um but there's also more options that can kind of blend those two so not only when if you look up top those data sources were also the relationally relational databases with these new data sources maybe i want my video files or i want my sensor data or i want my video my audio chat logs etc you can kind of put that in the landing zone which is kind of like your staging area in a way um but it can have a mix of kind of that lake and the structure tables um and that can exist in its raw form um you can move that then to a staging area maybe you start to put that in third normal form maybe you want a warehouse up there um example here i saw someone cheering we don't talk about that a lot of these webinars data vault um is a way where you can kind of store the data in a way that's flexible enough that you don't have to necessarily put it in the warehouse so again the benefit of the warehouse that is i want to exactly know what i'm going to report on how i'm going to report on it and make it consistent that's a strength but it's also a limitation the idea of the data vault is you model it in a way um that is structured and you have some business layer on it but it's flexible and you're kind of keeping in its format because you don't necessarily know so it's in a bit of kind of a best of both worlds approach if you're a fan of data vault and then in this example um you can kind of take the denmaris star scheme of them as well and then the data consumers are often more broad as well maybe it's not for you again you can't necessarily not do that in the warehouse but maybe it's an api maybe it's a more analytical uh maybe it's r or python on the front end and or maybe it's your power bi or tevlo or microstratus you whenever you're using the front end this has been referenced from clicks so that was i am referencing their model at the bottom um and then you'll notice in the bottom um and again often when we think of a glossary or the old um you know metadata repository or data dictionary it tends to be sort of source specific here's my data dictionary for the warehouse this idea of these data catalogs can be a bit more broad they can have lineage across different sources they're a little more user-friendly you can kind of do that search and governance and kind of a bit of search and discovery for your data assets kind of mixed with metadata about what those mean so again it's sort of if you remember the the goofy guy in the half suit half jeans that's sort of where we're headed with these modern day warehouses where it doesn't necessarily have to be kind of the either or where you get your leg in one place and you've got your in warehouse in another um either platform wise or just design wise you can start to sort of merge and morph the two which is pretty exciting or just pretty boring and banal when you think about it so um you know at the end of the day we're taking stuff we're dumping it somewhere and we're massaging it um and then doing stuff with it I mean we're on an oversimplifier very complex jobs the other thing there that was sort of highlighted but not maybe enough with the and I've and I've seen very intelligent people otherwise very nice people get into arguments about is it ETL or ELT is it extract transform and load which is more of your traditional data warehouse or is it you can extract it you load it into this more active landing zone and then when you want to know how to use it you can transform it and use it and it's you know either Denmark or Holland database or however you want to use it for your or flatten it out for some analytics um in a lot of ways it's kind of the same thing right you're dumping it somewhere you're dumping it before you dump it after is it a flexible dump is a way you know is it so but it's the same sort of pieces this is how you put them together and maybe I'm almost simplifying because I know if it's complexity but sometimes it is helpful just kind of step back from some of these heated conversations and being you know a little bit more just abstract about these things that can get complex very quickly um so I mentioned this before um and this idea of the democratization of data warehousing so there's Joe and his left Joe do you want to set up your camera we can see I'm just kidding but that might be you right now right send me your kitchen table and that's the beauty of these new systems is that you could from your kitchen table will download NASA data and uh you know do your own analytics that the the power of some of these laptops a friend of mine actually his dad did work for NASA um and he said you know he's an older guy now he's saying man the the power you guys even have on your laptops today so much more than we had in our systems and now it's beyond your laptop right you have all the power of these platforms and you might wonder about that picture of top so if you're ahead of you or either now or in the day before you had to start up a system you should have had to go through the dba they may be offended to be being in the color they might be like yep that's it I own the keys to the castle and I've got a dragon you are not going to get on my system because it's going to be fair to them um it may be the enterprise data warehouse and you don't want Joe's pajamas all of a sudden just logging into the data warehouse and starting to change some figures oh I wonder what it would be like if we gave Mary a raise uh see if anyone would notice you don't want that sort of thing um the story I probably shouldn't tell I had a friend who's a quote data warehouse expert and he actually that kind of looked like my kitchen table and he was really proud on a system he built for a healthcare company that was unnamed he logged in from my kitchen table and decided I just want to show you what I did but I'm just going to go into prod and I know this guy just had surgery but I'll delete it after and he went into a live patient record and changed some of the data to show me how his different warehouse was going to his report was going to show differently and I just gasped almost fainted talking about a hippo violation he literally had to do a source medical system and change someone's medical record and he then was going to delete later so that is not what you want which is why you have the database administrator with the dragon behind him um and data governance to do that so again it's not really a bad thing but now Jonas Bajamas can go very easily onto things like Amazon's and Web Services Google Cloud Azure etc spin up a platform and do some amazing things um by downloading some of these open data sets or or you know uploading your own system so that is a very exciting part of cloud um but it's also a risk and we want to manage that accordingly um and the fundamentals still apply so I guess we should have mentioned that before with the analogy of me trying to build my own wall in my house until my contractor friend said do you know that's a load bearing wall that's a good point I might have wanted to think about that and had I built more than one house as you did you might have thought of that um so same thing with with database design there are core best practices um that those of us who have been business for a while have sort of learned from but again it doesn't mean that everything has to be a star schema or a relational database um there's not one size fits all just do it mindfully so balance your use cases for what you need for performance scalability usability etc and of course it would not be a done a very big webinar if we didn't use the word metadata because that is critical to anything um you need to understand the context the traceability and the meaning of the data and that is part of the risk of spinning up data you can load things and the number of clients that's happened to someone you know did an analysis and they said well that's not what I meant by that data source that you use so therefore the analysis is wrong right you really need to understand the lineage the context and the usage of that data quality of course um so plenty of statistics on you know that doesn't lose a minute a lot of data scientists who get a great degree and have our super smart people to be able to do this analytics and their first data science job 90% of the time it's just cleaning up the data you know with you know it's m mailers m you know marketers m whatever you have some very boring and banal things so the better the more you have this great data quality with things like master data for example it actually makes a more modern data analytics easier so again one could do the either or thing which I'm not a fan of and say oh we don't need things like master data we're doing big data analytics now it's an and right if you're doing big data analytics on customer sentiment you absolutely want a good solid list of customer master data and to get that you need things like data governance so um you know as usage increases and more folks have their eyes on this data you do need more data governance and accountability for that so moving ahead um this was a twi report um that they were kind of saying you know what how do we get faster insights from this faster data and similar just to kind of back up that case it's not just me saying it um when it looks at some of these impediments from really getting value from some of these new systems number one was data quality issues because I mean the the good news of that and and some of the folks in the call might be saying you know how do we get visibility for data quality with the business sometimes the best ways to do a report and let folks see it they probably don't see that the data is bad so spin up the report they want and then maybe they can see some of the data quality issues but that will long term be a hindrance for getting the value to us data silos some of that could be you know remediated by things like a data warehouse um and then governance and regulation is huge right we cannot get any value out of the data if we don't know what it means who's responsible for it what the lineage is etc and then data transformation that's your classic ETL ELT whatever but at some point you have to have some understanding of the lineage of that data and do we need to kind of transform it to be consistent or do we want to keep it in this raw form again not always a one answer for that and back to the governance and this is a slide I use a lot but it's clarify a lot of conflict in organizations of just enough data governance based on the platform it is absolutely not an all-in-nothing and kind of a good guideline to use is the more data shared and used either across the organization or within or beyond the organization the more formal it needs to be so one of our clients did a lot of open data publication they were our agency they published scientific statistics they absolutely wanted that right and so they had a very strict governance was reviewed by several scientists before it was published that sort of thing or think of as I mentioned it's your physician master data or your customer or your patients you absolutely want that right and you want very strict governance around that very different from the bottom which is maybe just exploratory I don't know what are people saying about our products we want to just do some social media sentiment analysis let's see you don't want to over-regulate that because then there won't be any innovation so you don't want to kill innovation for the just for the sake of having extra governance but you don't want to under govern it either back to that PCI example where someone who's putting you know customer information on a unsecured platform and then it's sort of kind of this layers between that so core enterprise data which is kind of that one step below you know if you think of master data or even refrigerator as your golden record those are the jewels they must absolutely be well governed well managed your core enterprise data may be things like your your data warehouse your enterprise data warehouse where that's still pretty important I can't have bad numbers to the street right and then one level down would be you could call these different things your functional or operational data maybe that's a mart for a particular area maybe it's a relational database that needs to be good enough to kind of see my I don't know my performance for my servers or something but it's not missing critical in terms of you know being looked at by the board or something like that and then your exploratory data as I mentioned that might be your sandbox I don't know the answer the data doesn't have to be perfect I'm just looking for trends but the other important thing to remember is to create a operational life cycle for this how and we're actually we're doing this right now the company that has make all their futures innovation team and the entire job of that team is to find cool stuff and and start to you know do some sample dashboard and analytics once they find one of those nuggets we're putting in that process but how do you move that to production and publish it so great idea now we want to make sure this data quality and there's governance and we're putting in that operational process which is what this is trying to describe either it could be an entire model that yep we did a test we want to make that model uh analytical model into production make sure the data's right or could just be something simple like a field that maybe that field isn't in the warehouse but we only did our our analytics we found that weather is a predictive indicator so can we put the temperature of day when people bought that product in our warehouse maybe that's important or whatever but there has to be that life cycle from discovery when there is something cool that we can make that production and of course where there is production quality data like master data i pretty much guarantee that your data science group would want nice clean data so make sure you have both in your ecosystem um so as we talked about design that you know there's these great new platforms that can be speed lightning fast they can scale um in the spirit of curmudgeonery you know a lot of folks say the star schema you don't need to do that anymore the platform is so much faster you don't need a star schema so is the poor star schema dead he looks awfully sad there um i would venture to say no and i think that you know we saw plenty of examples that has its place isn't the only place um but again performance is one reason um but it's also just a nice way for usability to kind of slice and dice the data so as you know um when you design a database part of it's for performance part of it's for usability and reasonable ability so you know column store database can be very fast but if you're trying to query that later for analytics pretty hard here it is that you're just select name from table um because it's just not labeled that way that's really not its purpose um so again think of that i'm star schema i see it all the time we use it all the time and it isn't the only solution of course um but it doesn't mean it's going away in case anyone thought that at home this call um and and so which leads to this this idea of there is not one design but please do think of it as in terms of these are designed patterns right just like you might have a pattern for a suit or a dress it doesn't mean that people can only wear this one dress is you know you pick the right pattern for the right occasion um so as long as i've been in the industry there is still the battle of is an admin or a timble data warehouse and there's pros and cons to each or it's a kinman or a imbal or you know people sort of mix them together but it does seem to people like to have their article uh arguments um and this there's pros and cons um to kind of having the third normal form type approach kind of have your enterprise um you know normalized database and there's also definitely a place for kind of your star schema slicing and dicing and they can be combined uh data vault we kind of mentioned they have the i'm not going to each one of these could be a whole webinar but just to give some ideas of things folks are using on some of these platforms data vault as as we mentioned it sort of is a a type of modeling that you keep things in often it's raw form that's modeled in a way that's flexible so that if you you know again the pros and cons of a data warehouse it is rigid by design but then it's rigid so if your business rules change you need to change the warehouse and you might see of course because the business rules change right but the data vault kind of flips that a little bit if i don't know what the business rules are so let's just store the data in a way that's not just willy-nilly like a warehouse but modeled in a way that i can be more flexible down the road uh columbner columbner how can you say that um that again is great for speed i might just again if you're not familiar with it think of instead of doing your you're focusing on the rows if i have a table in your spreadsheet type way you flip it and you're really focused on the columns just look at one row of that and that can be a very fast way to do some analysis or web pages and things like that not always great to query but again it has this place um the more you work with the data science team they want to flatten everything right because that's the easiest way to do some of these um analytical models or denormalize everything and again nothing absolutely nothing wrong with that for that particular use case probably not the best way to store your warehouse one big flat table um but again i've seen grown adults argue about these things and it's not that either one was right they weren't seeing their their point of view and more dot dot dot i mean there's so many different ways and a lot of these um uh data warehouse platforms and cloud platforms have a a variety of solutions and and i get it so easy probably because the industry change is so fast you just do hear crazy things i've heard they don't have relational databases in the cloud or um you know everything is no sequel or and most of these vendors that i mentioned having a quiver of tools that have a lot of these different types of modeling patterns that are supported so do take advantage of that and that's a nice thing too or graph might be another one i mean i could go all day on the different types um but we don't have that time um and so that's also a nice way to test some of these technologies because you can kind of spin up you don't necessarily go have to buy a massive platform it's kind of learn a bit and a lot of these vendors um do have really good education um about their platform but kind of more um just general as well so i recommend it is a great place to learn some of these technologies so um summary reporting and analytics we continue no surprise tend to be a big business driver as more and more companies want to be data driven uh cloud-based technologies have a myriad new opportunities for things like scalability performance ease of cost flexibility um which you can take advantage of i think for me and there's a lot of things that one can take away from the new data warehousing and data data cloud-based platforms i'm kind of excited about this idea that data lake and data warehouse kind of merging into uh yes there's a place to store um variety of data sources and there's a variety of ways to model those and let's keep them close together and not create more silos and tech that don't need to be um they kind of use them for the best of worlds that does lead to more citizen data scientists but just like me trying to break down a little very well in my living room we've got to be careful what you're doing so the core fundamentals still apply so if you are someone that's new to cloud data warehousing don't be afraid to go read i i just reread myself the old um kimball data warehousing book there's some good things in there um don't be afraid of the fundamentals because they still apply um but there's also a new thing if you're someone who's been in the industry a long time and kind of learned with kimball and him and go take a look at some of these cloud platforms and some of these new technologies no sql um this is one right so um exciting time to link these together and no matter which one you pick you can't skip the governance or the quality because that's exactly what makes your data same so just before we open it up to questions those graphs by sent all the ones that were from our paper at data diversity can be found out on data diversity net also in global data strategy dot com and sharon generally puts links to all of this in the follow-up email that you will get shortly um and please do if you're available on a be able 26 we'll talk about master data management um and how that kind of can be aligned with your your governance in your process as well as your data so um without further ado just oh if you want help we do this for a living um and give any questions or thoughts sharon shannon we can open that for questions donna thank you so much for another fantastic presentation and just to answer the most commonly asked questions just a reminder i will send a follow-up email by end of day monday for this webinar with links to the recording and links to the slides and if you have questions for donna feel free to submit them in the bottom right hand corner of your screen there i've seen some chats going on so give you a little time to enter that in so donna um doesn't regulation like ccpa and gdpr require governance um all the way down through the exploratory data yes and so that was the point i made earlier you want a governance layer across all of your layers so and that was the poor predict that i keep bringing up that had put pc i out on the exploratory cloud and it got into quite a bit of trouble or the example i gave of you know if someone steals your credit card and they say it was okay it was just a pdf and aws and you still you care so there's certain things that are non-negotiable in terms of search security and lenience but i would say in terms of governance of what does this field mean in certain ways you do want to let your exploratory platform be exploratory and i would say that things like master data have more governance just by the nature of their nature of the beast you know you're trying to get a single golden record of all of your physicians doing surgery in your hospital you absolutely be right i'm trying to do sentiment analysis and what my customers think about the new flavor we launched last week probably doesn't need to be governed as heavily but yes i still in either case can't send out patient information or client personal details that is true it does cover but what i was trying to say is that beyond that there's nuance you don't want to over govern the poor people trying to do exploratory stuff and do you have any comments or on automation for speed for enterprise data warehouse build out automation for speed um well i think there's i mean definitely you should be using in terms of you know some of the loading tools are to be you know all that should be automated in terms of your ELT or ETL i mean nothing should be manual anymore and kind of getting those processes in place for the automation um trying to make sure that load is as fast as possible we've kind of changed data capture and and kind of being smart about that a lot of the performance and tuning can be sort of automated through reports to your error checking i mean most everything in that platform i would say should be automated with the exception of things that need a human like your design like your modeling and i think the more you can spend time on automating the stuff that is repeatable um you can spend more time on people looking at things that need to be looked at like your design and what things mean even that said though even some of the metadata tools stuff that we used to do by hand that you know some of the abbreviations and what they mean there's some neat AI tools that can kind of do an augmented learning for some of the stuff that even though a human needs to look at it can kind of be augmented um i'm not sure if i answered your question but i did in terms of yeah most most things now that we ever had to do by hand or a human could by hand if you think it's getting repetitive look virtual like an automated because it probably exists and don't how you govern catalog manage metadata across on-prem and cloud especially when cloud is often outside normal it project governance um i have a slide on that in another presentation um yeah i think i think you also to think not it's not only of a point of a cloud versus on-prem but it's kind of usage back to that i think that pyramid of to start with what's the use of your catalog is it that this is the documented definition that you might have heard me on previous want to talk about encyclopedia versus wikipedia is this the standard published definition that is published all and should not be modified without a governance to check it or is it you know there's some tools that are more collaboration tools uh that kind of allow back and forth and in most of these platforms can kind of see both on the cloud or on-prem when you start to get kind of off or between companies there's this idea kind of metadata registries and there's cloud sort of web based models a lot of folks that publish open data i'm seeing more and more they have a nice metadata set and we're working with a couple companies now that they're trying to if you think doing a data model that hard in between within a company trying to have some industry standards across organizations helps as well because if you're going to be sharing data across there's probably a governance or between organizations as well the governance doesn't always end with your own company so that was kind of a laundry list of things but hopefully one of those hit the mark indeed and so don't know why why is security and privacy at the bottom it should also be at the um security and privacy layer as well right um i must say that again the security levels at the bottom yeah why is security and privacy at the bottom so security part of the button doesn't mean this last i think that was a graphical to say that was sort of the foundation um so that goes across i guess we put it could have put it at the top or along the side um but i like to put i generally in my architecture diagrams put it at the bottom along with governance because to me that's the foundation on which everything is built if you look at it that way um and then i tend to put either left or right or top to bottom things like reporting and the user facing stuff either up top or on the right and kind of go that way but yeah just kind of a design thing doesn't mean it's the bottom in the bottom in terms of last i would say it's the foundation and i think we've got we've got a few minutes left i'm going to try and flip in a couple extra questions here um how do cloud data warehouses alleviate the need for olap cubes why do they alleviate the need for olap cubes well um i mean i think a lot of ways some of the performance can be faster um with some of these than that but i think that can also be a misconception um because i think there's several reasons for olap i mean one is the ability to have a faster performance but the other is some of the usability um in terms of i just find it really intuitive to kind of have that cube model because it's kind of that slicing and dicing or a excel pivot table that a lot of people can relate to um so in that sense i think some of those paradigms don't go away um but there's also other ways to do that having a good data catalog can sort of you know a lot of those kind of tubes have a nice semantic layer with what the terms mean or kind of have a business layer some of that can be done through a catalog i don't think that concept goes away i i don't think it's outdated i think there could be other ways but i do think it's something a lot of business people kind of can get their brands around easily um but yeah that's my hope and what is your opinion on late binding approach for data warehouse build out um yeah i would think that kind of gets back to your your etl your el t um i mean i think with some of these data warehouse platforms the idea of kind of loading it all up there is not a bad idea and then kind of deciding as you want to use it how it should be modeled how it should be transformed i do think is a trend that has a lot of validity to it um is you don't know everything i think and again a traditional data warehouse there sort of is that design build mentality and you can kind of load the data and but you'll still have that idea that you before you load the data you should know how it's become formed but i think in this more analytic world it is more of discovery so don't throw the baby out with the bath water and kind of have that platform where it's more raw and then you can kind of decide schema on reading and how you want to transform that later and there's new platforms let you do that really nicely because that's that kind of the guy with the jeans in the suit um kind of do both i love it well done as this brings us to the top of the hour thank you so much for another fantastic presentation and thanks to all of our attendees for being so engaged in everything we do and hanging out with us today in this crazy world just again reminder i will send a follow-up email by end of day monday with links to the slides and links to the recording of the sessions thanks everybody i hope you all have a great day and stay safe out there thanks donna shannon bye