 Hello and welcome. My name is Shannon Kemp and I'm the Chief Digital Officer for DataVercity. We would like to thank you for joining today's DataVercity webinar, What's in Your Data Warehouse sponsored today by Anomalo. It is the latest installment in a monthly webinar series called DataEd Online with Dr. Peter Akin. Just a couple of points to get us started. Due to the large number of people that attend these sessions, you will be muted during the webinar. For questions, we'll be collecting them via the Q&A section. Or if you'd like to tweet, we encourage you to share our questions via Twitter using hashtag DataEd. And if you'd like to chat with us or with each other, we certainly encourage you to do so. And to open an exercise of the Q&A or the chat panels, you may find those icons in the bottom middle of your screen for those features. And just to know the Zoom chat default sends to just the panelists, but you may absolutely change that to network with everyone. To answer the most commonly asked questions, as always, we will send a follow-up email to all registrants within two business days containing links to the slides. And yes, we are recording and will likewise send a link of the recording of the session as well as any additional information requested throughout the webinar. Now, let me turn it over to Zach for a brief word from our sponsor Anomalo. Zach, hello and welcome. Hey, Shannon, really appreciate it. So let me share my screen and let me hop into what we'll be going over today. So first of all, hey, everyone, my name is Zach. I'm a solution engineer here with Anomalo. My background was as a data engineer or data analyst at Capital One, data engineer in hospitality as well as AdTech. My previous position was at Tableau. I was helping clients with visual analytics. So suffice to say, across my entire career, every single job I've had, every single data set I've helped clients with, my entire experience has been plagued by data quality issues. And here comes Anomalo. That's why I joined. So what Anomalo is, is a modern data quality platform that automatically catches anomalies in your data and finds the root cause. So we're going to be going into a quick product demo of what Anomalo can help with and the data quality space. First, I want to go over a quick background. So here in the modern data set, there are going to be anomalies that can occur in quite a few different places. So first, of course, they can occur way upstream when we're ingesting data or they can even occur at the source. But beyond that, even after we take that data, we can create anomalies through our ETL process or maybe through orchestration. Maybe our pipeline breaks or our business logic could be implemented incorrectly or don't handle a certain type of exception. So beyond that, we can even have data quality issues at our Enterprise Data Warehouse. And this is a big problem because this is the source of truth for all analysts and end users downstream who can also experience data quality issues. Maybe calculations can be mis-created within dashboards or ML models aren't trained properly. So where should we monitor data quality? This could happen right at the Enterprise Data Warehouse level. And the reason being is we'll be able to catch everything at the Enterprise Warehouse level, including what comes upstream, both at the transformation level and the ingestion. So this will protect end users. We know that any anomalies that might occur here could be end user generated. So if we're monitoring data, there's going to be three different things we'll have to do. First of all, of course, we're going to have to detect that there are issues occurring. We need to do that proactively. But not only that, people need to know. And the way we do that is by sending alerts through tools like Slack or Microsoft Teams. And then finally, after people know that there are issues, the next step is of course to make sure that the result. And that can often take a lot of time. So we need a way to root cause those issues as fast as possible. And so ideally, what a data quality monitoring platform would be able to do is to automatically take care of all three. So there's three different types of checks we could do. And actually, I think I'll jump into the demo and go over them there. So here I am in an anomaly. And what we'll be doing is we're going to monitor a certain data quality set. And this is my fact listing table. And it contains ticket sales data for shows and concerts in the United States. So the first thing we'll be seeing is six different categories of checks right in. But before hopping into those, I want to take a look at this visualization. So what an anomaly is giving me here is a profile my data. So all these columns are within my table and within these columns, my venue say, for example, I can see a distribution of the records. So I see that it's quite common for venues to be located in New York and California. So I know that there could be potentially a lot of ticket sales here to go going into the actual checks. The very first line of defense is data freshness and data volume. So data freshness just asking is my data arriving on time is yesterday's data here in my table. And data volume is going to be saying, okay, I know that at least some data has arrived, but have all those records arrived is my data complete. So what we'd want to see here, and what an envelope is doing is it's calculating the total number of records that are loaded each day, plotting them over time and then generating a time series model. And what that model is doing is looking, trying to predict that range in which on any given day the number of records should fall. So it looks like yesterday we're right in between that range but a few months ago we would have received an alert because this was well below that expected range. Now, of course, these models are going to be taking into factor components of seasonality. So for example, if I expect no records to come in on weekends or maybe very few records, then this time series model will adjust, and the next time it happens on a weekend, it'll say, hey, this is actually normal. So these first two, you may have heard the term data observability thrown around a lot in the industry. These fall under that is a very shallow level way of monitoring, but we want to go a lot deeper into the actual content of the data itself. And the two ways we can do that is using unsupervised machine learning for custom checks. So with unsupervised machine learning, the objective here is to automatically catch issues without any sort of configuration. I don't have to write rules here. So we could either look for significant increases and missing data, such as null zeros or drops in particular segments. So for example, the total number of records in the table may have not dropped but this pop segment could have dropped 30% and I want to catch that. Or we want to detect if data changes in any significant way, if there's specific drift. And now we'll go over an example of this in a bit. And finally, we have key metrics and validation rules. Key metrics are going to be KPIs that I'd want to monitor over time. So let's say I care about the average number of tickets I'm selling. I'm a data scientist or data engineer, and I want to self service in a very simple way. So what I would do is click on this icon. And without any type of coding experience, I'm just going to look for my average. All I have to do from here is just like my call, my number of tickets call. So I do have advanced options, but I'm just going to run this without any other configuration. So no coding is actually needed. And let's hop into this. This should already have finished pretty quickly. Okay, cool. So I actually just got an alert and slap. So I'm not caught in anomaly. It also alerted me. So I'm getting this proactively, and I could take a look at this in an hour. So here I could see that the average number of tickets is generally around 10, and then shot up to over 13. So we're key metrics are going to be things we'd measure over time. Validation rules we actually care about every single value within a column that we're getting very granular here. So we might care that every value in our list time column is never no or every value in my list ID calm are going to be unique. So what this check is doing is saying when I multiply column one by calm to it should always equal calm there. And I'm saying there are 233 records I fail the check they don't follow this relationship. And more than that, is it gives me insight into why this is occurring, where in the data it's happening. And what this is saying right here is 100% of the bad records in my data set. And that would be these 233 records that fail this track. And I'm saying, do you know what, all of this 233 records have in common, they happen to have a venue state in New York. But at the same time, only 26% of the good records my data set have a venue state in New York. And this is extremely significant evidence that a much higher percentage of my data occurs in this venue state of the bad data, and I know to address my issue there to help me find that root cause. Now the last thing I'll mention is, these are really easy to set up we without any coding we just clicked a few buttons, and I'm going to calculate the metric. I plotted over time generated time series model, alerted us when there was an anomaly and I even got a root cause analysis but that's still not enough, because I don't want to have to write hundreds of thousands of rules for my enterprise data warehouse. The last thing I'll mention is this unsupervised modeling check, which is saying, I found a really severe anomaly. All of a sudden my number of tickets column the value 40 shout out. Now we saw earlier the average went from around 10 to over 13 so this might be correlated. And here's giving me a tornado plot where it's saying, what if I took the most commonly occurring values for yesterday's data and compared yesterday's data to the data from the day before. And for most of these values they don't change much day today, but for the body 40 went from 0% to over 10%. So the last visualization is just giving me the same root cause that we saw before. And why this is so important is if I hadn't known to create a custom check for this specific type of anomaly, and I'm a would have found it for me anyway. So, I'll pass it off to you Shannon, but thanks everyone for your time. Happy to answer questions later. Thank you so much for this great presentation and for kicking us off here. If you have questions for Zach or about a normal feel free to submit them in the Q&A portion is he'll be joining us at the end of the webinar for the Q&A with Peter. And now let me introduce to our speaker for the webinar series Dr. Peter Akin. Peter Akin is an acknowledged data management authority and associate professor at Virginia Commonwealth University president of data international and associate director of the MIT's International Society of Chief Data Officers. For more than 35 years Peter has learned from working with hundreds of data management practices in 30 countries, including some of the world's most important. So books are many first starting before Google before data was big and before data science Peter has founded several organizations that have helped more than 200 organizations leverage data specific savings that have been measured at more than 1.5 billion. His latest is anything awesome. And with that, let me turn everything over to Peter to get his presentation started hello and welcome. And hello and welcome to everybody fan and thank you so much and Zach what a great demo on that I look forward to welcoming you back in about 50 minutes so we can chat some more on this because Zach very well exemplify the problem of bad inputs cannot help to lead to anything other than bad outputs, even if you put something awesome in the middle of it so first little bit on this let's talk about the title here and it's always important as analysts to understand the questions that we're trying to ask so. So, when somebody asks what is in your data warehouse that may be the question they're asking but I'm guessing it's more along the lines of. What do you guys do in with respect to data warehousing, and that can be a very different question then you know what specifically in the content there. And if we're going to do it that way we probably need to change the order the words what do we. What do your data warehousing operations consist of course we're ending in a preposition said that's not right let's move that around so what do your data warehousing operations consist. Probably not the way you'd ask it if you were thinking about it but let's take a premise from that and look at it from four different perspectives first of all defining what we mean by broadly data warehousing. So we're going to go to the dim box for some specific guidance around that, and look at it in the context of what everybody out there is experiencing at least a degree of which is some sort of legacy to digital conversion, typically involving of course cloud facilities. We'll break that into two parts the first one focuses on the subset of data warehousing that are largely focused on addressing integration problems. And then the second chunk will be focused on data warehousing capabilities that are really involved in preparation of data in the patient of being presented out there will finish up with some best practices. And after a quick take away we'll come back for some Q&A so I found that there's two types of individuals that are usually interested in the thing here so I've got to get rid of the first one are folks that are just approaching the subject even though many people have been in data warehousing for decades at this point, but that you still have or faced with a need to integrate separate disparate data components, and then to somebody somewhere is clamoring for a more holistic view of the business operations management may have just pulled up magazine out of the back of the airline pocket and said warehouse is great let's build one I mean it sound old but it's still happening here. The other type I find that people that are interested in this material is people that have overly complex or messy practices and they're looking to make some improvements on. So let's jump right in. Start off with data warehousing as a result of an input output. Excuse me input process output diagram component here, where the output from this process that I've just shown here input process outputs is warehouse data. So what we've done is kind of difficult to show that there's value in taking data that we have in parts of the organization and duplicating it in another part of the organization by itself would be very difficult to show some business value but when you start to show that this warehouse data can be used by a number of different applications for different subsections of data optimized for certain types of presentations and things. That makes a very different story, you still have the process of accounting for this making sure that the investment if I told you this cost $1 we'd be all very excited if I told you it cost a billion dollars you think twice or perhaps three times before you jumped in with both feet. So given to that, what most people look at here is a thing of value in your organization your ETL infrastructure that you have in and of itself is a major source of data structure and transformation knowledge that will be incorporated into your digitization process. Not you're missing the boat in a very major way. Our dimbock wheel if you're seeing this for the first time is the idea of the expression of what it is to do data management and you can see I've highlighted that data warehousing business intelligence management is a wheel, excuse me as a wedge in our pie on this. We're going to see a subsection here again our own version of the inputs processes, that's called activities in this case outputs that are there and some additional material these are very good references I'm certainly not going to walk you through these. A definition though warehousing is in order, one of the things defining it broadly is a technology solution that supports business capabilities such as query analysis reporting development, and also development of these capabilities. Some people say it's analysis of information that hasn't been previously integrated, and some people say it's a new set of organizational capabilities that we need to be able to maintain. If we shift now from data warehousing to business intelligence, even though these are often linked the conversation around business intelligence has existed since before I was born back in the 1958s. And it's always been trained on focusing decision making processes and improving it, which means we can have certain technologies, etc, involved in the process, and that the obvious place most of these starch is trying to look at historical patterns and see if historical patterns are going to enable you to predict future performance or improve future performance. Some people drop it down the definition of just saying using math and business. No standard definition that which you can see none of them involve necessarily technology in that sense. Then we have the term analytics and I have a real challenge with that because I have trouble defining it so when you look at these definitions here, they say well it's got to do with models and that's neat. The way I would approach this is to say, most organizations come and say, what, what data warehouse should we build or what capability should be addressed with data warehousing. What should be how can warehouse based integration address business challenges that we have. And unfortunately I'm going to start with a negative example which is something that we have to fight uphill with in all cases in this case. You may not remember where this particular bit was from this Indiana Jones and the Raiders of the Lost Ark and if you remember the story it was something at the end of the movie getting stuck somewhere in a warehouse never to be seen again. That never happens. Well, here's a live example for you from a healthcare company that had 1.8 million members and of those 1.4 in the data warehouse of the 1.8 million members were marked also as providers which if you think about it from a business perspective meant almost everybody could have their own personal physician. On the other hand, there were 800,000 that were in this data warehouse that had no key. Having a key meant that they were inaccessible under any circumstances 29% did not have a provider social security number so again there's another 30% of the data is inaccessible. Actually only 2% of them had the required nine digits on that particular number provider number. Sometimes it was a synonym for the other again you can see it's confusing and of course when the boss found out it had one user and cost 30 million a year to run. This was a problem because the boss clearly understood I can take a room full of MBAs and lock them and slide pizza under the door and accomplish this analysis in a much faster fashion. Now the reason we have to do this is because our systems grew in our standard legacy fashion they grew up as individual siloed applications and most of the time data was formed around them and the way to look at this problematic is to say that as things connect to other things they become more complicated and more brittle and less flexible and of course this business needs change the need to change the technology around it oftentimes accompanies that. So, ideally what we'd love to do is to integrate this by saying hey let's put some new technology in the center of this and start putting data in there and that'll give us an organizational data and clearly we've got the ability to distinguish it because it's in the center then we can re architect around these other aspects and come up with a wonderful spoken hub model assuming that's what we've decided is useful for the organization here we're not talking to architecture just trying to get the concept of the data at the center. That's sad that's a very traditional view of warehousing. There are some things that should be looked at which we could take an entire webinar to go through things such as link data which can do these things for literally a tenth of the cost. And that is something worth looking into because there's such a large community and a vibrant set of organizations. It's not a panacea but there are some very good answers in there that are just tremendous starting places. In addition to that there's the concept of virtualization do I really need a physical warehouse or can I virtualize it in some fashion the answer of course is yes with the ultimate result being cloud I'm showing Amazon and this particular case similar kinds of things are true about their offerings in fact that Gardner is the one that's recommended the cloud selection is really a question of accessing the data that's available easily under that cloud as opposed to the specific cloud features also cloud three times I suppose we should define it. The real key is first of all location independent. And at the same time shift of risk so that you're running the right operations in the right regions, all that, and that virtualization is a clear component of this with the ability to spin up and spin down in capacity, with details that are abstracted from everybody's particular daily pieces. The scalability is another often touted approach where people say that as your demand increases if you're doing it from a physical perspective you have only the option of adding things in a step function, whereas obviously can be done the opposite here by tracking very closely with supply and demand. Similarly, however it can be expensive, because the concepts around this are linear increases in almost every increase where as you get out of the economy of the scale that you used to. Of course, every organization has got a virtual road to the cloud which is a wonderful thing but notice they're not talking specifically about warehouse and we're just talking about different cloud capabilities and options and again we could go into another webinar and get things on there but I'd like to bring home. Whether we're talking about clouds or warehouses there's three things that we need to pay attention to. First of all, I hope you're all in agreement with me that the data that is inside the cloud should be a cleaner than data that is outside of the cloud. If the opposite was true if it was dirtier we think we'd have a different conversation going on around this. The data in the cloud should also be smaller by definition. The reason it should be smaller is because it should be architecturally more shareable. You can architect shared structures that will help a lot at the organizational integration level. These three are particularly important because most of what happens in the warehousing environment that's problematic follows what are relatively poorly done practices consisting of literally forclifting the data into the cloud. Notice the cloud also expanded to make sure to accomplish and ingest all the data. The problems of course with forclifting is that there is no basis for the decisions that are being made if there are any decisions being made. It completely ignores any guidance from an architectural or an engineering perspective. There is no concept on the team that these ideas are missing in the first place and let's get Frank 80% of all organizational data is redundant, obsolete or trivial. So the way it should be done is that the forclifting here brings this into an organizational context where we shrink it we improve it we make it clear make it less in volume. We make it cleaner by definition on there and more shareable in order to do this and this is true whether you're warehousing or clouding and of course typically you're probably doing both at the same time. Interestingly this brings on an opportunity for something that we call data branding. Or now you can say this data that is in this warehouse or cloud based warehouse if you want to be precise about it in those circumstances is of cleaner. There is less of it that you have to get to know and wait through in order to find what you're looking for. And it's an easier data set to look at of course the real problem is if you just work with your data in the cloud you're kind of cleaning it like these individuals are working with things in there but it makes it more difficult when you do using that component there. So let's dive into this first integration piece there are two purposes for a data warehouse in general and then they're not precisely articulated but you'll see they're subtly articulated in here. First of all, the purpose is integration that you have disparate data sources, even though most data is never analyzed there's a lot of opportunity around this I was working with an organization that in just one week had identified 100 more points of integration on a central level and they really anticipated having on this and it's going to have the same kinds of inputs and outputs data characteristics and quite frankly it's probably to some degree is working at the moment. So there's some downstream knowledge that is incorporated upstream we're able to take feedback from what's happening and improve the overall process the other type of warehouse purposes literal preparation, the last mile of the data before it becomes part of the activities. These are usually closed ended activities, which the gives one last opportunity to check programmatically for quality measures of the type that Zach was describing to you just moments ago so Zach I'm assuming when you come back online we'll talk about your ability to integrate into this environment, but here's the key for everybody choose one upfront here and make a decision and try to stick with it because they will have an influence and that's what we'll spend some time thinking about is how this influence needs to be accounted for. We'll first do the integration and then we'll do the preparation concept around this integration we're already in. If we ask the question how much is it going to take to connect everything to everything else here. The answer is a frightening 15 and times and minus one divided by to our formula for this and if I look at this in a similar example. The client that I was working for the time the Royal Bank of Scotland told me I could I'm sorry bro back of Canada. How that happened is thinking about Scotland for some other reason, they had 200 major applications at the time resulting in about 5000 batch unit interfaces back and forth between them so those are good numbers to take a look at and in fact if we start to look at the numbers here's the problem with connecting everything and everything else. Obviously the worst case in all cases is what's being plotted on the yellow line but let's look at the 200 and the 5000 that's not a bad place to be necessarily when you look at where you potentially could be. But my goodness, 15, I'm sorry 5000 actual interfaces is very, very problematic. Now the basis for data warehousing is incorporated and really best articulated in and then in health and some of this very fundamental corporate information factory there's lots of use copies of floating around it might even be Google books by this point to take a look at it and the main thing to think about it is that they look at the process of producing data in data products as a production process and so in that sense it has a very good approach to this engineering here for example is same exact pictures on the first part but this is applied to a specific challenge in this case hospital performance management so you can see all of the various claims. Information comes in through these top pieces goes into this one platform where there is different types of access to a workflow access pain patient and physical position portal, and the dashboards that they have on the center to be very specific about this. It is called an Inman warehouse, and his definition is a subject oriented integrated time variant non volatile collection of summary and detailed historical data used to support the strategic decision making processes of the organization. It's a very precise definition what he's talking about is taking various sources of data bringing them into precisely defined staging areas for a first set of transformations, then I move into a warehouse to a governed environment. In this case again the branding opportunity is still potential right there and then moved out of that warehouse to a series of data March, where then they are accessed by the users but nobody directly accesses the data in the data warehouse, because it's operating at what we call third normal form which is not the easiest form to understand or navigate and so unless you have very very user accessible opportunities here you will have very little purchase of people trying to actually buy into these is some typical grows and cons. There's perfect as close to you can get as little data release data redundancy. We can do things like enforce referential integrity and attribute specific indices that give us flexible querying and things like that joins can be expensive again you don't really want to work in it you want to store things in the data warehouse, and then pull pieces out of it to work on it. This makes it a big operation and one that we have to specifically address from an engineering oriented versus an architecture oriented perspective there's no one correct or incorrect. But for what we've been talking about you can see it's a more complex project particularly since the second part of this I'll give you will be an easier definition. So the engineers help develop these technical designs which are very complex. And architect is required to integrate engineering perspectives to bring them into it and so it's important to make sure that you have the ability, if you're moving on this type of a journey to emphasize engineering talent, particularly data engineering background that will be helpful to do this if you do so or if you're considering doing up to far let me go back. Okay, particularly if you're doing something such as target was reported doing here in this particular example, they are showing a slide from a target data scientist, who was unfortunately so. I'm not aware that perhaps the activities that they were doing such as predicting whether guests were pregnant or not might be not tasteful and actually presented this at a conference of people that were doing it and a newspaper from newspaper reporter from the New York Times saw this wrote an example which target later disavowed and said it never happened as a big brew haha. This is a very carefully engineered system anything that you touched that target would link back in to whatever you're doing and the challenge with this from an integration warehouse perspective is that you're going to obviously have these diffuse applications on here all the time, but everybody for example who might be called a chief analytics officer would only be concerned with the right hand side of this diagram everything to the left of the yellow to the right of the yellow line that you're looking at and unfortunately that doesn't leave the best opportunity which is to take that learning and feedback arrow that's there and point it back into the black box of the data management practices, where you have even better leverage even more opportunity to impact things that are happening in your environment I'll give you two very quick examples and then we'll move on to the next one the first one is an organization that was very very good at feeding people. Again, the ability to crank out thousands of sandwiches quickly and an emergency was par excellence, and they were just curious. What else could their data do. And so, after an interesting work with them that was discovered that in a certain city where they had good definitions of food and food deserts and things of this type, they were able to take a bus line that used to run down just say main street, or what's a broad street in Richmond, Virginia this is our dash line that we have out there. That if we deviated slightly and just made a small change in the routing, it would impact thousands of people in terms of their ability to obtain good quality of food and the organization here as a philanthropic organization felt this was a great way to enhance their own use of the data. No way they would have been able to come up with this visit they had tried to focus on an end product they simply had to go with an exploration mission around all that so let's dive down into preparation here as well. Business intelligence is a very, very interesting field unfortunately it's not very well defined or not very objectively defined and so while you get these nice looking pictures. It's kind of hard to really operationalize it but the basics are that users can drill down anywhere and that there's a sort of conceptual cube that they use to figure out what they're doing, giving you the ability to summarize or drill very, very specifically and of course there are emphasis around this on this n dimensional cube in this case showing three products time and geography but you could just as easily add in other dimensions, not necessarily from an engineering perspective but just importantly not without rebuilding the data cube that should make sense if we add another dimension will need to rebuild the cube or if we change significantly a dimension will need to rebuild the cube hence why you saw that previous separation of them in here. We'll see in this case it's much more integrated and that's straight to the answer type of thing and it permits this type of arrangement permits different uses that where people can dive in and look at cancer patient revenue or the revenue for disease. In other words East region or you know which which are the toss total costs and revenue for the top done perspectives giving you different facilities different perspectives. Again, more specifically you might be able to take a data set that's very large and cut it down to say, I want everybody who has income less than 100,000 or as younger than 30 years old, and then lives in the city of New York so now we've gone from 30,000 customers to 6000 customers. We've purchased something x whatever it is we're looking for in the last seven days and that takes us from 6000 customers to 800 customers, and then we say who live in New York purchased in the last seven days, and who supplied those pieces so where you can see reverse engineering in this particular approach and saying what's going on here think of this doing sort of a long order, you know context on this could not do that without the data structures pre existing very difficult to do in real time without good data structuring around this. Another example just a banking analysis we can have different cube dimensions again social status geographic location, some organizations are now trying to predict whether you're a good candidate for credit or not based on your social history. It's like an interesting proposition we'll probably have to learn more about it before we can figure out so you know your approach in the banking environment is going to be balanced the loan with the risk of default. You can loan more money to people who are likely to pay you back but you will charge less interest rate in order to do that. And so that's going to give you a couple solutions where to go what to do. This can be the kind of tool that can help you to answer those questions. So in this context we call this a Kimbell warehouse I copy, and notice his definition is much simpler than Mr. Inman definition by the way that neither has ever agreed on anything except I'll show you at the very moment a new design a hybrid design to get you started in these but it's important that you understand the distinctions, because this is what's out there. So it's a copy of transaction section copy of transaction data that's specifically structured for query and analysis. There's a lot of traditional systems coming in from the data sources, they go into a staging area, but they're directly put into instead of this, the centralized data warehouse, they're directly put into use. And this allows you then to have different types of ability to report and the beauty of this is something called a star schema. In fact, whatever you're trying to store as the unit of I want to add subtract, go up and down the abstract here is the central table in that, and then if you incorporate the right types of dimensions each of those dimensions represents a possible data analysis. And so it gives you a lot of specific cases that are very easy, very short to results in terms of moving on through simple to design, not so good for turning into something else later on. So it tends to get the guidance earlier on to pick one and decide which way to go, although again I'll show you a trick on that. The key is of course if you don't build in those questions and you have no ability to answer them. And so, in that sense, they're going to be focused in on one specific fact and if that fact doesn't fully satisfy your answer you're probably going to have a frustrated customer. What you want to have here is architecture talent and make sure that you have the ability to attract enough architects to properly give you an environment for the system, because their key for all of this is that the data man is not slowing it's increasing in this case and our capabilities to keep up with that are not increasing in the way they should so they're going to be more expensive. That's what you should do and probably I hope I don't scare you all with this next set of statistics but it's important to discuss. Everybody wants to do good efficient data analysis so we'll all agree that some data preparation is simply inevitable. The question is how much now if I start you out at 50%. That means you would get sort of a 5050 approach in terms of every dollar that you invested in you get 50 cents worth of analysis and 50 cents worth of data preparation time and of course what we'd like to do is make data preparation time as inexpensive as possible. So everybody goes well how about if we could get to that. The problem is, anybody that's been in the profession as many of you on this this webinar know is that the answer is here that you spend 80% of your time doing data preparation. And this is everybody knows this except for management who's paying for it. And that's a challenge for all organizations here's a couple of specific challenges on the airlines. Think about the warehouse that American and United Airlines have with their frequent flyer data. Now just to give you an idea of how big and how lucrative these are. In 2020 at the height of the pandemic, American Airlines was $6 billion in market capitalization not a terrific performance and their American Airlines advantage data was valued between $20 and $30 billion. So considerably more than the airline in itself, similar numbers there at United this is all due to the fact that organizations have been performing very poorly in their data analysis areas and even these big guys who should be good at it are experiencing challenges around that the solution to this now just to simply give you some whiplash is to start looking at not empowering data scientists to a greater extent. They do a great job they're doing fine work in terms of what they're doing but to look at another avenue in the war another front if you will. And that front is of course knowledge workers why knowledge workers well too many organizations simply drop better data in front of other organization other workers. And they say that's great, but it's problem. So, when we look at this in terms of a problem what that means is, there's numbers by we're from the data literacy project which has done a great job of doing work in the area of data literacy as well But here's a number from their previous survey which is 48%. Don't look at the data and they just make a good decision. And that's disheartening we take a lot of trouble we give them data and we give them dashboards we do whatever we can to give them good quality data, and they still defer to their get well that's not the worst event to see sweet executives are worse still two thirds of them will defer to their get, and when you're faced with a workforce that has the ability to do something, it will 36% of the time, find another way to do it and I don't know what you do if you're not doing it in a data focused way but that's probably my own myopia talking about everybody We're still 14% of them going to task entirely. This is happening in a very significant way for our organization we need to do some things to address them the key is first of all recognize that data, whether it's in a warehouse or whatever is always approached by people in the data that where they think they're coming to the entire space and just realizing they don't have just the tail of the elephant in this case they only have one aspect of what they're looking at in data, and that is important for everybody to get back on one place through a data literacy exercise it's built off of Maslow hierarchy needs. I'll take you back in high school real quick. Take a big breath right. It starts off with if you don't have the clothing and shelter needs that are met thing you can never be safe if you can't be safe you will not be part of something that is bigger than yourself. If you're never part of something that's bigger than yourself you won't know your boundary where you end in the self and everybody else takes over in there and therefore you'll never get to what we call self actualization. We've come along and stole the term and now it's called flow okay I don't care what it is. But these five levels are kind of important we can do the same thing in looking at data literacy by saying, we need to educate mobile data spreaders but they're beyond our control similarly we'd like to educate adult data spreaders they are also below our control, but as soon as you've crossed this blue line into the knowledge worker category of which there are 1 billion. We now have the ability to codes of conduct and through employment agreements to specify specific behaviors around data. There are a couple of other levels we're not going to get to all of them but the data profession just know that everybody should be part of all of that knowledge, and that everybody needs to go back to it particularly data scientists and cyber professionals you'd be amazed with the number of data scientists that don't understand there are programmatic ways of going about and helping knowledge workers become more manageable with data I'll give you just a preview of what the data sciences still focused on helping the organization drive on through. First of all, this particular data scientists not the one this picture that's just a stock photograph. It's there but is in the elevator and they're talking and the boss gets on it one floor looks down at the data scientists and says hey how's it going down there. And then the scientist looks up and says, Ah, yes, I've actually gotten to a 72%. And instead of getting a while that's great, you know, kind of accolade instead steam comes out of the eyes and ears and nose and, you know, looking down says, I have this organization we never do anything less than 110% don't tell me anything less than 100% in the future are we clear in our understanding clearly that was just a complete miscommunication and an unfortunate understanding. And more unfortunate though, upon further examination of the specific incident turned out that a 68% achieve would have been profitable two years ago. That's really the ending to this is to say that there's absolutely no way that the most of the data scientists understand the business context in which the solution needs to be delivered we need to do a lot more work around that similarly. We need to have knowledge workers understand that they are becoming steward to somebody else's data and that there are specific fiduciary responsibilities that they're responsible for making sure that the process demonstrates value. And that the process and the data both stay current because there are specific as I said fiduciary responsibilities, and that we are all literally swimming in the same swimming pool around that. And what we have to do is that while it's great to have things in a data warehouse, and where data things happen. Yay, we're good at celebrating that sort of thing we need to also become good at turning those things into things that the people in the corner office care about, whether it's dollars or whatever it happens to be we need to make sure that we can move in a direction that helps us do that and the reason, the way in which we do that is by strategy and strategies critical to the development of approaches. The idea of strategy originally was that it was not a business term that the business consultants kind of hijacked it in about the 1950s. And it became a thing a plan that you looked up went to when things went wrong step based on 27 stroke 13, you know, blah, blah, blah. Well, it turns out that's not very operational and that the better use was what was used before the business consultants picked up calling it a pattern in a stream of decisions which says that strategy is much closer to a process and it is a product and give you just three examples here Walmart's former business strategy you guys all know every day low price pretty straightforward and it was learned from the top of the organization to the bottom to the entire customer base to every supplier that works with it. A very successful implementation of a strategy Wayne Gretzky strategies states to where he thinks the puck will be great Wikipedia article if you're looking for more details on it Wayne did a wonderful job of putting that together. Strategy number three, if we're in a adversarial situation, we're facing the good guys and the bad guys we use one strategy if we're both on the floor. Right we use another strategy if the bad guys are up at the top and we use still a third strategy if the bad guys are up at the top, and we're on the bottom here so you see why it's important to think of strategies a process that surrounds what do I do, given that I've seen a pattern of this pattern in a stream of decisions. And think about it from this perspective strategy guides workplace activities. So if we go straight to a data strategy then and say if the workplace can be as big as it is understanding the metadata around it. The data strategy then consists of a high level data guidance that focus on specific business goal achievement when faced with a stream of decisions around data, and that we have to make people understand that data decisions are what most managers are making without understanding that they are making those data decisions, including in the area of warehouse and cloud operations, etc, etc. Data strategy that articulates how the data moves that you're going to make support the organizational strategy, which is probably a measure of proactive and reactive types of measures let's move now into sort of a data practices area. And the idea here is we really need to reform the question. Most people start out when they're approaching data warehouse, excuse me, data warehousing thing. How do we build this data warehouse. That seems like a good one, or what should go into this I've already built it. Right now, real question is, I need to specifically articulate a business challenge, and I need the warehousing capabilities that can solve this business challenge and look at it from that You can address a class of business challenges, but contain types of things like foundational practices specific project deliverables that should be attempted when the organization has reached sufficient maturity, and understanding that even if you do not actually get it right the first time that you have the option of constantly evolving it so don't turn it on I expect it to be successful but instead say here's the first version, we're going to improve these particular areas. You know, again, could be that speed is one of the important pieces. So this is the data warehousing component of it. I want to turn to the term analytics here and just sort of rag on it for a minute. Here's analytics explained in a graphic in three areas connect any data source explore and visualize and share the story sounds great. There it is in four here it is in five there it is in six there it is in seven, and there it is in eight that's Peter's name is 345678 method if I can pull his graphics out of Google images within a few minutes, the area is not precisely defined enough to have us even talk about it. Let's just look at the typical textbook this was offered to me to teach by one of the publishing houses here business analytics. Well, what is that. This is just analytic so let's get rid of the unnecessary word, high foot word information technology let's drop that one out of there and say hardware software which is what we're really talking about statistical analysis and quantitative methods. They're the same thing let's just use the word once mathematical and computer based bottles saying comment just models to help managers gain insight blah blah blah to make better decisions that's really what we're talking about here and then a longer context of transforming data through insight to shorten the process of transforming data into actions through analysis to solve problems, and then supported by tools. So what started out as two thirds longer than this close to 90 words has been reduced by about 31. And it's the same process what we're really talking about here is the understanding that this is overly described and that we need to just focus on the basics and particularly notice even that with this definition here, the business context is missing whether we're teaching MBAs or we're teaching data scientists, the business context has been missing. What we really need to stop saying is analytics because it doesn't mean anything and stay instead data analysis understanding architectures, because all systems have minimally these types of architectures in them and that your data scientists and data warehouses will have to complement these structures instead of compete with them, but only this can only happen when they are understood documented in order to do this. Again the understanding is the idea that there's a digital blueprint illustrating the commonalities and intersection among these architectural components that forms the basis for your digitization efforts around, which your digitization efforts are supporting and that the same information is shareable among the systems and the business subject matter experts, as well as the technology people that are involved in this and I told you there was a nice answer to this. It is it turns out that both Mr Inman and Mr Kimble about together and said, this is the starting place where we should start which is it called a data vault and the reason is such that the data vault encapsulates business levels along with the data so the different sections can happen again a very simple example might be that you might have a Spanish subsidiary that had data recorded up until a certain date in pesos and then incorporated in euros now while that's not a huge thing to overcome. There are those types of issues that are easily handled by this and most importantly evolution into either one of the two other data systems that I showed you before is trivial from the starting point so the data vault again it's got historical long term storage with the business levels that retains all of the lineage information and consequently has a different series of structures than most which is hubs, links and satellites. Again, a little bit more structure upfront a little bit more planning but very, very good results on this. The pros associated with this are that it's very simple and it provides much faster research to this. The cons are that it's a little bit more complicated and there's not quite as much adoption around these types of topics that social the in men, the kindle and the other one but the advice from both of these two gentlemen is to start with the data vault and then evolve into one of those two areas. When you're looking at particularly initially in warehousing efforts, there are some great starting references that are out there, and I won't say too much more but many of them you can buy the books the models are right there, able to be used for any kind of ETL this model of giving you the URL so you can go download it directly. It will handle virtually any kind of ETL metadata transfer. This becomes the central point things going into and out of there are similarly organizations such as OMG that have provided all sorts of meta models around these types of topics. What we have to work on as I mentioned before was this knowledge work with productivity around warehousing. We need to make sure that we instead of having a $30 million a year warehouse that has exactly one user. It's a very easy cost to calculate and it's probably clearly understood that that one individual is not adding $30 million to this, but instead we look at the sort of, you know, dying by a lot of unnecessary small cuts. We remove one click from a repetitive process add them all up and then start to account for it. That's how I can say that I have saved organizations more than a billion and a half dollars over the last 40 years and doing this type of work. I basically followed a wonderful set of theory of constraints which just holds that in any system there is going to be something that is blocking that kind of like clogging the veins. That makes the entire system no stronger than the weakest link in the chain, and that we need to adopt a fine fix and move on to the next one report and if that sounds simplistic it is beautifully simplistic from W Edward Deming. So he uses plan do check act or there's another one he had originally plan act plan modified deploy but everybody goes to PCDA just seems to roll off the tongue easier everybody understands that was what we're talking about. Thinking of understanding the component I'm going to tell you two quick data stories and there's a lot of guidance on how to do this but let me show you the reasons for these two particular stories around data. First one is that we were involved with the military suicide prevention epidemic it's been going on since 2010 it's a horrible situation, and to have more of our more fighters hurt by their own hands and by the enemies hands. At the time we just happened to be close by and we're thrown in the middle of this thing and we started mapping data sources and trying to figure out what was going on and very quickly. We discovered we were operating off of a 30 by 30 matrix with a room that we called our council of kernels. So all of these kernels would be sitting around and Colonel X would get up and say, my row is row seven please put a you and column nine and our column 14. Right and you see how that was very unwieldy and again, well intention because everybody wanted to do exactly the right thing, given all this so we had a chip we could pull with the secretary of the army's office who came in slammed the leather bound asset on the table is that we are doing things this way and anybody got any questions and really did make a difference because while everybody was correctly saying my purview of authority extends this far he said, I'm going to say that all this data belongs to me. And if anybody wants to make an appointment with me and find out why I can't use my data to save my soldiers lives. They can absolutely come and talk to me. I transformed of course the entire operation and let things know. Most importantly though the individual told me that I could use that story with attribution and I have told that story to more than 100 corporate CEOs and not a single one of them will take that same step. They permit organizations to fight over ownership of data and continue to have problems around that area. This data set that we put together was clearly an integration data set. So that's the idea, what the answer is going to be we have done work on the problem we understand some more about it, but we are absolutely not done in that particular area second example quickly just before we finish up here is the Jan six components with most people are familiar with and the idea was that the parlor users who were attacking the capital with a very leaky social media system. You know, where they really at the capital with this certain plot seems to indicate that it was, and this one is a multimedia version of the same kind of thing of where were people and what was going on at the time and then put public it took together and put a timeline that has all of this type of data so you can scroll along the lines and see it exactly 325pm. What is happening in near the capital around DC or inside the capital and the coup de grace. It was just wonderful from an explanatory perspective is each of the cell phone hits that are down near the rally and you could see the cell phones go to the capital so there's no question that all of that data, even though it's perhaps illegally obtained was still very conclusive that people from the rally did attend that particular piece and there's a tremendous amount of positive data that has come out of that. They didn't know what they were going to need and interesting what they did find in the process was a lot of cooperation from the technology community the technologists loose that have been helping. So far, almost 1000 people get charged in this area. Well, we spent the last 50 minutes looking at warehousing understanding that the Denmark does have a role to play in here and offer some good guidance, but that the general thing that we're looking at is obviously to digital conversion, and that there are two types of data warehousing focuses, one is emphasizing an engineering approach which we don't want to be leading edge requires an adaptive edge and trying to further and better engineer what we're trying to put together. The second is a preparation oriented which is the last mile. Sometimes the two of them 10 working conjunction of course, and the best practices evolve around having an understanding of what you say that you adopt some sort of repetitive and do check act type of cycle, and that you call your data on a regular basis, with respect to repeatedly looking at strategic direction as that evolves over time and removing simplifying consciously, taking data that you're not utilizing a lot out of the environment and making it generally much simpler. So we're going to get to a couple of quick takeaways on this and then we'll get ready for your questions and invite Zach back to join us at the top and I see this a lot. 16 reasons for data warehousing failure I just don't like that I just put his poor availability mean you know, yes I understand that could be a problem. I look at the entire process in a very different fashion so this is the famous use case up here in the right hand corner it's an icon of course please don't try and hurt your eyes reading that. What I'm trying to do is say, what's happening here so here's an example of Wikipedia that says a user is editing a particular article. Now, use cases are kind of useless unless they haven't fully integrated glossary, because we can't capture non functional requirements without that integrated glossary we don't know whether an apple is really an apple, or an apple is instead a piece of a leaf that is falling from there so the planning for them. These non functional requirements are detailed as the systems architecture, and that architecture the warehouse, because most people don't start with that requires the average warehouse to be built and rebuilt seven different systems before it becomes useful. Now, I've put together a bunch of additional references for you here just want to make sure you understand we've got some other events coming up including will be all gathering in DC and the first week in December for DC IQ, looking forward to everybody seeing that and I may not have my date on that 13th right but anyway, other upcoming webinars with that books for sale blah blah blah. I did document quite a bit the corporate information factory in here so I'm not going to walk this through here with just know that there's a lot. Similarly, the Denmark gives you a bunch of specific starting out guidance but just consider these as preliminary starting points and then there's a good deal of reference material around all of this. Some of it may be contributed by yourselves in here and I certainly look forward to expanding the collection, making it more useful to everybody as we go forward. So I'm a gosh and I'm finishing 30 seconds early here today I must have run way too over had that extra cup of caffeine before we jumped on but there's certainly good to be with everybody this afternoon and we'll go to if you got any questions for us. If you have questions for Peter or for Zach feel free to submit them in the Q&A and just to answer the most commonly asked questions. Just reminder I will send a follow up email for this webinar right in a day Thursday with links to the slides and links to the recording. Everyone's really quiet out there today it's a beautiful day at this location. There is a question that came in Zach during your presentation and you know we're a vendor neutral company so we don't ask for product comparisons from one company to another but so but where does an envelope compare or what makes an envelope stand out kind of and where do you just can you expand on like where you sit in the stack. I think I could do this without I guess specifically comparing us to Monte Carlo. So, I guess I would separate the realm of having confidence in our data to two different categories. One is data observability and data observability is like shallow level monitoring. We have a lot of our basis so we might only care about hey is my data arriving on time, and is it there on my table. And I want to check that for 10s of 1000s of tables, but I don't really care much more about the data of event basic metadata checks was a column data type change, things like that. So that's one run. And there are a lot of tools that specialize in that. And then at the other end of the spectrum there's data quality and data quality is much more focused on going a lot deeper, caring about the content of the data. So we know that the data is there already that that's pretty basic to get that information. But do we know if that data is correct. Because the only thing worse than making a decision based on no data is making a decision based on wrong date. And if we make a business decision on that that could have really bad adverse reactions to our business and stakeholders could be at risk in some type of exposure. It makes it look bad right Zach. Yeah, yeah, I've definitely been in that situation before when I went to. By the way, were you in Richmond at that point. I was not I was in Tyson's corner in DC. Oh yeah, I grew up. I grew up in there. Anyway, we won't bother with details but yeah I know absolutely what you what you just said was very important. Capital one is going to have a completely different focus on data quality because of their regular regulated business but also they have very clearly articulated that information is a strategy that they were using to grow during this those years and so if the data wasn't right they were simply growing incorrectly. And they would talk about it internally as well and it's like saying cancer centers cells are good let's grow more than because they grew really fast and they're really strong. Exactly. So I mean, that's a great point and you know we would data observability and you know knowing if my data is there if it's arriving on time is important. But ultimately, it will be caught one way or another. It might be caught delayed without data observability if someone will very quickly know as soon as they query the data. Oh it's not there. Oh it's not fresh. You know, ideally we would catch it earlier and that's where data observability is important and anomalo already has those features to catch up as fresh and complete. But where we might not catch issues is the actual data quality. And so there needs to be active monitoring and in a way that's scalable. So on supervised machine learning, we could use that to find it to catch a wide net and find anything that could be significant, you know one to 5% change in the data that might be different in yesterday's load versus other loads of the data, where instead of writing a list of tools, whether they're done in a no code way or with a bunch of custom Python that will get very difficult to manage. And as an enterprise scales, there will be a lot of issues that come with that. So it's a bit of a different paradigm with the nominal thinking about it in a very different way. But on top of that, of course, there's always going to be a time and place for custom checks. And so the approach we take with that is it should be as easy as possible. And no one shouldn't have to be a very technical person in order to monitor data. That is important to them. If I could just expand on that for a second because I find this is really helpful for most organizations. The idea of the proactive monitoring, which is so important to what's going on. We're not talking here about the notification buttons on your phones that are constantly going off 200 times a day but that you can fine tune this to the point where you know there are things that you should know and that routine things don't constantly just go beep beep beep and, you know, distract you in that sort of a fashion what sort of a percentage do you find that organizations decide to find hone in on from that implementation perspective of the proactive monitor. I actually really appreciate that question. That's really important. So that could be broken out into, I guess, two different parts. One is what data do we care about monitoring. So what I generally see is when we're onboarding a customer, they start with maybe one or two teams. They could be data engineers or they could be data scientists or end users who care about a subset of data. This could be anything from 10 to 300 tables to start out the things that are really important. The ones that impact end users in which our analytics could be wrong and we can make the wrong business decision. That's where organizations tend to start and then from there they expand over time. We're almost to the entire enterprise data warehouse. So for example, Discover Financial the credit card company. They are one of our clients and they, I think are at 7000 tables now, whereas Block Financial I mean block slash square, the payment processing company, they've gone to Detroit. Sorry. Got to Detroit right. Yeah, yeah, I think so. So they've moved from a couple hundred tables to, I think they're at 12,000 right now, and they have 200 users using Anomalo. So it is an overtime endeavor, but we start with the ones that are most important and then bring in other organizations, other teams who care about other tables as we continue to have success with that monitoring. Now, on the flip side of that, there's also another very important point. I don't want to be getting false positive alerts. I do not want to be getting too many notifications that makes it difficult to find the ones that are important. So for example, I don't use Facebook anymore because all the notifications I got were not things I wanted to say. So it's the same thing as an end user. If I get too many notifications, it's the same thing as getting no notifications because I'm just noise. So false positive suppression is extremely important. And Anomalo takes that very seriously and there's a lot of different ways in which it implements false positive suppression. And this is a big reason why instead of maybe in my legacy system, I might have hundreds of thousands of rules. And let's say a flashing tool like Anomalo comes my way and I might be interested in deploying Anomalo in a way where I'll import every single one of my rules over to Anomalo. And then in addition to that, great, I got this unsupervised machine learning to maybe catch the unknown unknowns, the issues I don't know I should be looking for. But that's still going to create the same issues that I have under my legacy system where I have way too many checks that could be creating a lot of false positive alerts. Now Anomalo will take care of a lot of those with this intelligent time series models and machine learning, but a different way to think about it. And this is a new way of thinking about data quality monitoring is to start with the foundation of unsupervised machine learning. So that I'm not getting alerts for things that really aren't that significant. And then on top of that only creating checks for the things that are very critical to the business on top of that. These approaches really make sense for all of these things I think the important thing for organizations to understand is that you're starting a program here. And that's where most people I think they say well you know I don't really need to have quality and here I'm just going to build the data and move it from my systems if it's wrong in my systems I wouldn't be making money right. Oh no it's a very different set of processes in fact the average organization keeps its customer data stored across 13 separate systems, which means that integration is always a challenge around this and the need for automation to this is is crazy because you can't just depend on a Peter exact to actually be there for you for all the time we might hit that $2 billion lottery right. Exactly. Do we install long enough to get a question or two. Lots of questions coming in now. Yeah if you figure out the secret to winning that lottery please let me know. So given a limited number of people in an enterprise data warehouse team and a complex organization with many data sources both internal and external what is the best way to deal with data integration. For example, when should data warehousing be used in preference to data virtualization or graph databases or vice versa. Well of course the thing to think of it is none of these things are necessarily and or type situations on this. So, when you're looking at the value proposition, which is what we're really talking about here. You're saying that in some format, the organization is going to benefit from a model that looks and again I'll just put back into the slide here and Zach if you want to pop anything up just go forward to do that but the organization may look too much like this slide where we started out where it's difficult to get things around all the data is a different formats we're not really sure you know how to navigate it it's more mysterious certainly not very user friendly and accessible. They got to do something called Pythoning, whatever that is to go and get access to the data or we're still maybe a batch job. So somebody has to say that taking the step and this does require not just the action of building this new set of system capabilities that you have, but that you're putting in a new part of the organization and that this part of the organization is going to be around as a part of your organization in a permanent form in the same way as your HR program and your finance program are going to be around for as long as the organization. Now, if you have trouble with that calculation. It's a very reasonable place to be concerned. What we find is that between 20 and 40% of all it costs are spent doing some sort of data conversion improvement evolution around these, and that taking this to get to a nice architect that I showed in three silly PowerPoint slides is really a matter of years in many organizations in order to do this but on the other hand if you get a CIO that comes in and says, oh no we are moving to the modern world. It can literally be accomplished in a matter of months and it's a wonderful transformation, very jarring transformation. Notice however everything that I've said here has nothing to do with filling this full of anything good so Zach a pretty good lead to turn it over to you for comment. I guess nothing on this. Is this question in context of self service as well. That is a good question. Don't know. What if it was. Yeah, go ahead and answer. Right, I mean I guess we with all these ETL pipelines and whatnot to the end goals to make sure that data is accessible. We need to be able to analyze and make decisions on it. I guess at the end, from the end users perspective, they're going to be, you know, either doing data science or analytics. And for example, in the BI space. We've seen over the past 10 years a large movement from the central it structured monitoring a dashboard such as Cognos and crystal reports over to the modern visualization tools like Tableau looker power BI. And the reason why there's been such a big movement is because of the self service capability and which end users don't have to go back to central it to ask them hey can you make a small change to this chart. They can just go in there and do visual analytics on their own. So, empowering end users to be able to use the data is what I would say from my perspective as an analyst and data engineer. This is all about. And I guess on the side of data quality. We also want to enable the end users to be able to do to create their own checks and we want to democratize data quality in some way, so that the experts in the data the people who actually use it on a day to day basis. Know what's really important and can add their own check without having any coding experience. And some of the metadata in the platform would also be useful in the sense that you could use that for sort of meta, meta information tracking usage, various of focus. Yeah, I mean as long as that is actual data that can be monitored, then, yeah, why not. You know with data quality is mostly concerned with the, the actual data itself, rather than usage, but it the same thing applies. Not necessarily self service. Sorry, what was it. If it's not self service. Yeah. Peter if you want to add to that too. Sure, so you have a challenge around that as Zach said, the reason self service is appealing is because people can make their own changes and they're most capable of understanding best their requirements, because they're closest to the business problem that they're trying to And this is a goal of data all along is to make sure that the right you data is in the hands of the right users under the right circumstances. If we're forced to go back to a centralized model where only it can change and then propagate systems and there are perfectly good reasons, for example, the military most systems are of that fashion. I really want the soldiers on the battlefield to start making messing with settings, because it will distract them from their primary motivation which is obviously to get out there and keep themselves safe and to make sure that the world is safer democracy around by the way happy voting day everybody it is that particular day. Do you think the data vault model should be created after enterprise data model with the main models as the building blocks with metadata. Otherwise, I'm not clear how data vault would comprehend the quote unquote business rules, since everything could many to many hub and satellite. So in the context of this, it is exactly what's envisioned is to take an enterprise model and go directly into the data model components and if you ever take dance training around that area he'll he'll show you specifically what the power that is because you're going from existing to own facts I mentioned before and do a line around earlier on remember the ETL metadata that you have there. That becomes a tremendous source of inputs because you've already made it run the ETL job runs every night or periodically or whatever it is that your, you know, frequency is around this, and having that agreement between the beginning and the end party of what they're supposedly looking at and in the same context means that you've got a tremendous amount of source material to start off on this. You'll also find unfortunately that some very well meaning but perhaps not optimal data practices are inserted in there because they didn't have the opportunity of looking at something like an envelope where they can actually have an active monitoring on the outside of this. So that's probably my diagrams where it would fit but it would, one could say it should be assumed as a best practice into warehouse data components around that. So yeah sort of a long, long mind the answer but I think that'll, or is that you want to add anything under that. No, nothing for me. Very good. Those are all the questions that I have right now. Anything you want to add Peter I would give everyone a moment to, to type anything else additionally. I really do want to go back and say that warehousing is a wonderful thing but warehousing by itself, generally, is not going to solve your problem that it's a combination of three things that we put on our demo DIMBOK wheel probably some combination of warehousing governance and quality maybe a good place to start, but it may also incorporate metadata and reference data at different segments. Now what you're doing obviously you can go to other aspects it's not to say that we're ignoring data storage because obviously if you're building a warehouse you're building data storage. But these things are really things that should be looked at as capabilities and where most organizations make the mistake is that they come in with a series of use cases and while use cases are useful. And so the idea is that if you're being presented with a group of use cases that's a wonderful thing and we're glad somebody's done that particular idea, but if they're not using an integrated trust and glossary to categorize and to define a concrete constraint vocabulary around the projects. There is no hope of this thing working and you'll end up in the same sort of build it eventually it'll work rebuilding seven times over too long of a period of time, which comes back and says you know how are we we invested millions in this, not just in dollars perhaps but in in version hours and things like this, are we ever going to recoup our investment time from this. And so the higher the investment obviously more difficult it is to get back out to the question is, can you use your existing smarts and appropriate technology to go in and not just build a good data warehouse to build a good data where else it's got some good quality data inside of it as well. I really don't have anything else on my side unless there are questions around data quality. Yeah, well I did get another question coming in and not necessarily on data quality but what what are the limitations of using warehousing to source data for production applications versus analytics. I think that they're not going, there may be different based on the two classes of applications that are there. Zach maybe you can add your thoughts to this too but you know it's just a set of constraints. If we're delivering something that is a production of feed into another part of an organization which is very much a part of the Inman warehouse type of integration here and again I'll just go back to the target slide here where you can see they're connected with an awful lot of different components around their environment just as an example of you know one way it can be done. Having that without the ability to talk about quality around it and not look at it you know in terms of the existing what we have really really good solid data in there. It's just unfortunately crippling to organizations but at the same time it can become a very difficult muck place you know quicksand if you will, where you've put in so much into it that you feel like you have to get something out of it but all of a sudden you realize you have 30 million a year in bills and one user. And it's cool and not a good ratio and that that set of situations. Perfect. Well that does bring us to the end of the questions again everyone's very quiet today. It's a good topic though lucky we were talking about Peter so essential to the whole discussion on data management and such an important piece of that demo wheel. But thank you everybody for all the engagement thanks to an omelow for sponsoring today and help making these webinars happen Zach been a pleasure to have you join us today. And thanks everybody I'll give you a little bit of time back. Thanks everyone. Thanks Peter thanks. Thank you. Thank you. Everybody have a great day.