 And here we go. Hello and welcome. My name is Shannon Kemp, and I'm the Chief Digital Manager of DataVersity. We'd like to thank you for joining the latest installment in the Monthly DataVersity Webinar Series, Advanced Analytics with William McKnight, sponsored today by Looker. Today, William will be discussing how to improve your analytic data architecture maturity with machine learning. Just a couple of points to get us started. Due to the large number of people that attend these sessions, you will be muted during the webinar. For questions, we'll be collecting them via the Q&A in the bottom right-hand corner of your screen. Or if you'd like to tweet, we encourage you to share highlights or questions via Twitter using hashtag ADV Analytics. And if you'd like to chat with us or with each other, we certainly encourage you to do so. Just click the chat icon in the bottom right-hand corner of your screen for that feature. And as always, we will send a follow-up email within two business days containing links to the slides, the recording of the session, and additional information requested throughout the webinar. Now let me turn it over to Joel for a brief word from our sponsor, Looker. Joel, hello and welcome. Hey, Jen. Thank you so much and thank everyone for joining us here. My name is Joel McKelvie. I'm with the Looker team here at Google Cloud. And just to give you a quick preamble to what William's going to say, I'm going to talk a little bit about the importance of why we're having this conversation and then share a little bit about Looker for you as well. So if you have any questions, feel free to enter them in, as Shannon said. And you can also mail hello at Looker.com. And we'd be happy to answer questions about Looker directly that way as well. It is more important than ever the use of data. And as data specialists and data teams, I think we probably naturally understand it, those of us on this call, better than most, that understanding how insights can make a difference to our business really is key to our jobs. And helping the organizations we work with, drive growth, create real experiences with data that drive business change, is very, very important. And there's a lot of statistics around that that we could go into. But I think this Forrester statistic about succeeding during disruption is particularly relevant right now with all the change we're seeing in the world, the COVID pandemic, the changes in the market, the changes in our businesses. Insights driven businesses, businesses that use data to really drive their business are 2.3 times more likely to succeed during disruption. And so what we're seeing is businesses that are using data as fuel to succeed during these kind of strange times in our markets, strange times in our businesses, they're doing better. They're outperforming other businesses. And they're even growing during this time of, in many cases, industry downturns. So we know data is important. And one of the ways we can see that data is important is that people are rebalancing where they spend, at least according to Gartner, how they're spending money on IT or on technology is really towards business intelligence and data analytics. And William's going to talk a lot about machine learning and analytics excellence in your organization. This is a place where many, many companies are focusing their spend, focusing their attention in 2020, 2021 and beyond, because it is so key to the survival and competitiveness of businesses, particularly during a downturn. So I don't want to beat you to death with Gartner and Forrester statistics, but I just wanted to share with you some of the relevance of doing well with data, being mature analytically as a business as you move forward. Now, Looker is very familiar with this space, right? We're now a part of Google Cloud, but we've got thousands of developers, thousands of customers who are every day trying to integrate data into how their users, whether those are external customers or business users inside their business, use and experience data. So we've thought of this term of data experiences as a new way to think about how people consume data. In a lot of cases, it might be so native to how they do their work that they don't even necessarily know that there's some type of analytics behind it. And also, one of the things that Looker is very good at, although we are part of Google Cloud, is working in a multi-cloud environment. So we support a wide range of environments for deployment and use of Looker, certainly Google Cloud, but also Amazon, Microsoft, Snowflake, IBM, and others, right? So that's just a little bit about Looker. Where does Looker sit, though, in a data stack? And so that's really important. Regardless of what database you're using, if that database can speak Looker either natively or through a SQL connector, Looker can connect to it and use the power of that database to drive queries for your organization, right? So data becomes modeled, governed, and accessible via Looker, regardless of the data source or the database you're using. And then built on top of that is what we call these data experiences that people use. And maybe the most simple and common one we're all familiar with is, you know, reports, dashboards, even data exploration as a form of business intelligence, which is really the traditional way of consuming data. Looker certainly does that. We supply really a world-leading business intelligence functionality to all of our customers, allowing you to visualize, but also explore highly granular data. One of the things that's different about Looker than tools you might be used to is that Looker doesn't pull that data out of your database, right? It isn't in database architecture. So as you analyze data as an analyst or a data team, you have that real granular data that's fresh and accurate from the database that you're doing the analytics with. No need for extracts or staging areas or operational data stores, right? We connect directly to your database to do that, and to drive really solid business intelligence analytics, automated reporting, and all the rest. But we also do things that go beyond modern BI, beyond reports and dashboards. And this is what we mean when we say data experiences. For example, we can integrate insights right into something like Salesforce. So if you have a team that uses Salesforce, you can get analytics driven by Looker available to you right within Salesforce. So your sales reps have more context, more information. The insight is integrated right into the workflow that they're using without them having to think about going into another tool or going into Looker even to do something. It's just right there for them. And speaking of workflows, right, you can really push information, data, insights via Looker into workflows as well. So we can automate a lot of interesting things around, Salesforce does this, one of our customers around email campaigns. So we can automate this workflow where if a customer is less satisfied, we can send them some more information or help them out with where they are. And that can all be automated through alerting, taking action on data. Data-driven workflows are really about separating the time between an insight that you might derive from analytics and the action that you can take on it in a way that's native to the user. And again, this is an idea of data being so integrated, so integral apart of what your business does that, say, your customer success person wouldn't necessarily know that they're using an analytics tool to get the insights that they need. Last but not least, Looker has this really robust API and some really robust extension capabilities that allow you to build tools like some of the major retailers we work with that's a custom application, custom portal, custom website, custom mobile app that gives this great monetization capability but also just this great analytics capability native insight and data-driven application to users. And so I don't want to beat this to death too much because I know we want to get to William here, but you can think about these four colored blocks here as different experiences of the way you, as a data team, deliver data to customers, whether they're internal customers or whether or not they're external customers to your business. Those people who need data to do their jobs or are paying your organization for data, the way we can get them data with Looker is far more flexible and has far more impact than traditional report and dashboard capability. We can certainly do reports and dashboards without any question but with Looker and working directly with the power of your database, we can really unlock new ways to use data and new ways to drive value. One of those ways, and I'm just going to talk about this in just a moment, is machine learning. Data science and machine learning workflows are incredibly complimentary for Looker or Looker's complimentary for those workflows because Looker provides a sync surface by which an API is by which you can access data right in that database but also access model metrics, version control data sets, so you can not only extract something like a training data set from your database and then use it to train an algorithm, you can also extract test and historic data sets using Looker and you can, once you've run your algorithm against them, compare all those data sets in Looker and see how the algorithm performs versus known results versus expected results. So a lot of our customers are using Looker as an integral part of their machine learning data science workflows so that they can build better algorithms or work more closely or operationalize machine learning in a way that's very, very powerful for their organization. So with that, I'm going to leave you very quickly but if you do want to know more about Looker, just reach out to us. Hello at Looker.com is a great way to do it and you can always just go to Looker.com and find out information. Oh, one other thing you could do, if you want to see it in action, go to Looker.com slash demo and we'll reach out to you. We can set it up and show you something that's really relevant to specifically what your needs are or the things you might want. So with that then, Shannon, I'm going to hand it right back to you and we'll pass it over to William. Perfect. Thank you, Joel, so much for kicking us off and thanks to Looker for sponsoring. If you have any questions for Joel, feel free to submit them in the Q&A for us to answer at the end of the webinar today. Now let me introduce to you our series speaker, William McKnight. William is the president of McKnight Consulting Group. McKnight Consulting Group focuses on delivering business value and solving business problems, utilizing proven streamlined approaches and information management. His teams have won several best practice competitions for their implementations and he's been helping companies adopt big data solutions. And with that, I will give the floor to William to get his presentation started. William, hello and welcome. Hello and thank you, Shannon and thank you, Joel. It's always great to have Looker as a sponsor of this webinar series and welcome everybody and special welcome to everybody. It's on the West Coast and looking out over orange skies and smoke and things and I'm here in Texas and while I would love to see an orange sky, I would like it without the fire and smoke. I don't think that's possible. How very 2020 of it to go ahead and do that this year. But all is not lost on 2020, as Joel mentioned, there's a lot of investment happening right here, right here in data, right here in data and analytics. And this is good timing, I think, to have this presentation as well because about, I think, three months ago, I gave the talk on data architecture maturity. And according to your feedback, you really liked that idea and some of the things presented there. I believe it's available on YouTube. If you want to catch that presentation, of course, I won't be repeating that, but I did get a chance to go into a lot of the tactics behind moving on up, moving up the data maturity model here. And the full title here is How to Improve Your Annihilated Architecture Maturity with Machine Learning. So I'm going to be answering that question and I'm going to be giving you some specific examples of some major applications that most of you are doing in your enterprises somewhere and how they might be improved with machine learning because I think that's the way to do it. Drive it right to the bottom line of the business. So that's what we're going to do, but let's start by reviewing a little bit, review for some of you, new for others of you. This is maturity level three. Now, this is what I said for everybody to get to quickly like now, this year have plans to be there this year. Things are moving so rapidly and one of the things, and I'm not going to belabor this, one of the things was to be all in on AI in your data strategy and that means a lot of things. I could just put a few words on the slide. Actually, we'll probably be developing that quite a bit today. So I look forward to doing that with you. Architecture wise, you've got your data warehouse. Of course, finally, you've got your data quality above the standard there. You've got your plans. You've got your Lake. I had another session on Lake, on the Lake idea and you've got that in place. You're doing streaming, not just ETL, doing more ELT than ETL, leveraging IP into your source target modeling. You're not doing it all from scratch. You're using third party data and you've got the Lake house concept. There was EDW accessing the data Lake. I'm finding this to be a very mature item. And again, how we do our maturity model is we look at you. We look at what you're doing and what the more progressive and how shall I say successful companies from a business standpoint are doing. And we back into all this technology wise, you've got your graph databases in place for those types of things. You might be saying, wow, that's a lot. We're adding up here. Yes, yes, we are. But keep in mind, I'm talking about larger enterprises. We're talking about somewhere in the company. Okay, maybe not you, maybe not your project, but somewhere in the company. There's definitely we can find some relationship requirements there that graph databases would meet the best. And I've tried to establish throughout this series that to do the best, you have to have the best architecture in place for that function. And it's tricky and requires a lot of good skill. And I also put MDM in there, your cloud first. Okay, you've got your data catalog started, data governance, getting going here, organizational change management. Yes, you're attending to the people needs of your projects. You've got some titles in place, chief data officer. You've definitely got data scientists at this point and strong DevOps. And hopefully you're at level three, but you probably aren't because I would say we're about 25% of companies that are here and these are going to be the winners. So let's get here. And beyond level four is level three plus. The things you see here, data as an asset and financial statements, you know it had to go somewhere for it to get to this level. So that's a goal for you right there. All development with an architecture, not a lot of rogue development that, obviously you got a nice balance of centralization and decentralization in place, but it's not a lot of necessarily rogue development that's going to server one off purpose and nothing else. And if you keep doing that, keep doing that you will get stuck in terms of maturity. So you're also measuring on a lake maturity at level four. You're measuring it to a model, to your model, to what makes sense for you and you're trying to grow it. You've got dynamic plans. You know they're going to change. So they're dynamic, but you've got some plans in place for where you are going. You've minimized the cubes which are unnecessary anymore, served their purpose. Now you've got MDM expanding. You're looking at things like GPU databases. You've got your data catalog populated now. It's not just sitting there wondering, what do we do with this? And you're almost all in on the cloud. Yes, you had the cloud plans at level three, but now you're actually almost all in there on the cloud. Now keep in mind we're down to a good like 12% of organizations that are here, but these are definitely going to be some winners. Data governance by subject area across all major subject areas. So you're doing well with data governance here. You're doing well organizational change management. Now in addition to all those other chiefs, you've got a chief information architect because information architecture has a seat at that table. Full data lineage. You know where the data came from, what happened to it, how it got to wherever it is at any point in time. So not just DevOps, but you have ML ops. I'm going to come back on this a little bit later. And finally, maturity level five, why not? Let's just finish it. And this is going to be only some of you right now, handful, but data strategy. You're an AI organization. You can say with a straight face that AI has changed our organization. And whatever we do, we're really an AI organization. Remember, we used to hear that about companies that are working with AI organizations. That's great. And I'd say you're also, you know, you also can be that. But at level five, you're an AI organization. Hyper personalization, prescriptive, not just descriptive analytics, and you're developing information products, products with your data. Selling data appropriately. Architecture. Data infrastructure is a platform with domain mastery, microservices and containerization. Analytics architecture, your ETL has become, I'd say almost fully automated at this point. Technology wise, obviously very progressive here. And then MDM master data management across the board. And finally, you've got data governance across the board to everything and it's pervasive all subject areas and it's pervasive. You've got your catalog populated, et cetera, et cetera. And you got your rules disseminated to all application. And you've got security wrapped up in there. And so people say, well, why is data governance coming into this model so late? Well, I don't know. It is. It's hard. You may say, well, it looks a little easier than the rest, but it involves people and it does tend to be hard. So you're not really fully there until level five. So that was a little sneak peek back at the maturity model. Hopefully you kind of know where you are, one, two, three, four or five. You can go over one and two. They're kind of boring. We want to move into three, four and five. Now let's just drill in on machine learning because I am putting that up for you as the way to move up your analytic architecture. Joel did some of this as well. Your machine learning pioneers out there are locking in now. They are letting the data speak. They are not just, oh, I don't know, making decisions based upon the Hippo approach. You know the Hippo approach. Highest paid person's personal opinion, I believe. They are actually listening to the data. Of course there's room for that leadership, but that great leadership today understands the limitations of human scale and the facts that are inherent within the data. You're using statistical models, machine learning for sure. That makes your machine learning pioneer. You're generating deep business implications to work, not shallow, but deep. And you deal in algorithm management within your company. You are trading algorithms. You have ML ops. You are sharing algorithms across the enterprise. Conversations go something like, do you have an algorithm for this? Let me trade you this algorithm I did for this, et cetera, et cetera. And I do want to point out, and this is going to come clean in the applications that I share with you, but ML pioneers acknowledge the human scale. The human scale limitations and where things can go when you get beyond that. So the first wave of machine learning leaders out there are emerging. They're emerging today and they're going to get exponential benefits because they are latching on to something that has legs, that's going places, that's really tied to corporate significance and success. And when you do that, and the cloud's another one, and we can probably rattle off three or four here, but when you latch on to those things that are going to stick around and really make a difference in the future and you get on to them early and you do well, you are doing great. Now, so it's kind of a win or take all approach, but don't be dissuaded there because if you're a fast follower and you get into this, I'd say, you know, this year or so, you're still in that range of able to continue to be successful as a business. So ML and action in the enterprise. So these are just some examples. I'm not going to go through all of them, but I do want to point out that let's define some terms here. Are these machine learning projects necessarily? Well, if they're using machine learning, I'm calling them a machine learning project, but you can clearly do a lot of this, like financial fraud, we're going to drill in on that actually, but fraud and call center experiences, you can do a lot of this without machine learning, right? We're all doing this. We have to be in business. You have to do all these things as appropriate to your particular industry. But are you doing them with machine learning, which is the modern, elegant way, efficient way to do it. Now's the time that I see a lot of companies are breaking into upgrading their applications from whatever it is before to machine learning. So specifically, I'm going to go through four of them here in a bit, and you'll see what I'm talking about. So I like to keep things simple. How to improve your NLA data architecture maturity with machine learning. That's what you came to hear about. Well, here's the answer. Improve your applications with machine learning. So we've got to start getting machine learning into your application. Now, I do definitely see machine learning spinning up all forms of new applications, ones that we have not been doing because it didn't make sense without machine learning. But what I'm focused on here today is where my primary consulting practices where I see projects that are using machine learning are doing out there, what they're doing out there. And that is within an application set of things that you're already doing for your enterprise. Let's just do them better. Shift them from only data warehouses and lakes and ETL, which stands for egregious, torial, and labor. Now, when you repeat this down the road, remember you heard it here. You heard it here from McKnight. So egregious, torial, and labor. Okay. And move from that to data fabric to artificial intelligence and pipelines. And so that is the new foundation of the architecture, the Anni Lake architecture that I'm going to show you here. Let's start with customer 360 projects. Like customer churn. You're doing customer churn management. To do that at scale requires machine learning today. Now the old stack, and where some of you probably are, no doubt, you're looking at things like shadow things, like spin. And when you see that spin trailing off, you begin to think, well, this customer might churn. So let's do something about it. And that's something that you do is from a very small list. Maybe there's one thing that you do about it. Send them a promotion, something like that. It has to be much more nuanced than that to be efficient today. True analytics and customer churn projects, customer 360 projects, will be enabled as the organization embraces the fact that all data is predictive and not just descriptive of the past. And when it comes to churn, corporate emphasis and incentives must reflect the real value that individual customers bring to the business. So when you're talking about churn, it's not just somebody looks like they're going to or they're not going to. There are many questions that have to get put on the table here. Like, do you even care? Do you even care if they churn? Do you want them to churn? That is actually on the table today for many companies as they look across their customer base, especially telecoms and financial companies that have millions and millions of customers. Each individual customer has that question associated to them. And finally, what do you do if you decide that they look like they're going to churn? And by the way, looking like you're going to churn is not a yes-no, right? One thing I like to say is there's 100% health out there. And you also have to kind of throw the date in here. You know, when might they churn? And when do you start caring? If they might churn in a year on a 10% basis, you may not care about that. But if they're likely to churn next month on a 90% basis, you might care about that. So if you do care, what do you do? What do you do? And again, it's not as simple as there's one thing that you do to those people or those companies. There must be many things, and there must be things that are geared appropriate to that particular customer. You have to look at the linkage between factors and customer behaviors. It's difficult to establish, and it requires an iterative, experiment-driven approach where you model, you set up your model, you set up your data environment, you set up your pipeline, such as what you kind of see here. So in this scenario, customer loyalty factors may be said back into the model as variables. For example, testing the effectiveness of historical loyalty schemes. Now, as with all of these projects, as we're stepping into machine learning, one thing that's going to be evident is that it's going to require streaming data, streaming data, not just DTL, but streaming data, getting all of the data in, and getting it in real-time. Streaming is kind of associated with real-time now, isn't it? Getting data in real-time so you can make real-time decisions. You will also likely, for all of these projects, as you step into machine learning, unless you've been great about it, up till now, you also will likely need to remediate some of your data infrastructure and bring in more data. In this case, the customer detail data all the way down to the levels of detail that are as granular as possible. Data governance will be required to catalog the data and ensure it is to a quality standard and in compliance with established privacy regulations. You will need operational databases for executing the machine learning models. Data security, obviously, is paramount across this sensitive customer data in this model, and workload management is going to be needed to organize the model execution. These are some things that are going to be required for all of these projects as you step them into machine learning. So, hopefully, you've heard about your customers, and that made some sense to you, but here's another one. Some of you have hardware, have actual physical assets, and in the old stack, you would look at, for maintenance, you would look at, well, we just do it manually, you know, we look at it and we decide, or you have a light use of data, or maybe it's arbitrary just based upon, well, every six months we replace that part, et cetera, et cetera. Well, obviously, that's not ideal, and we've moved to the ideal as the technology possibilities give us access to that. So, predictive analytics. Now, this applies to all kinds of machinery operations, fraud detection, marketing campaigns, and all kinds of risk management. That's really what it is. So, it has some correlation to the customer churn project we just looked at. Predictive maintenance stores sensor data from equipment in real-time and predicts equipment failure ahead of time to provide continuity of service. So, again, we're going to need data in real-time. As a matter of fact, we're even more so going to need that data in real-time. And it's a huge business win not to be fixing parts based upon a schedule, but to be fixing them based upon predictive maintenance born of data. That considers, what is the downside if we don't fix it? You might be able to live with that downside. It might be okay for a while. And where is the part available? I think a lot about airlines and airplanes. And they're almost all in circulation. Right? And they're almost all in circulation to the point where there's no built-in wait time for potential problems. You know. So, we have to consider planes moving around city to city. Where is the part available? How long will it take? And what is the downside of waiting for that fix to occur at a given airport location? What is the ROI of that? How many passengers are we going to delay? And what does that cause within the system? So, these are the kinds of things that are beyond human scale. And you want to put into machine learning, give it over to machine learning to make some of these decisions for you. Now, streaming data obviously is essential here. Streaming data ingest will be required for a number of functions and program flow elements. The streaming is where several of the application arguments are set, normal expected reading of the sensor, expected variance of the sensor reading, iterations to failure target, and the target reading for a sensor in fail mode. That's a lot for every part. And most people who have, most companies who have parts who have parts that require maintenance have upwards of millions of parts. Then it generates the risk scores and then sleeps and waits for more readings to be generated. It then loops until the program stops generating output and retries the number of specific times then equips. So, what do we need to do to bring our predictive maintenance up to the standard? It's a lot of the things that I mentioned before when it came to customer churn. The models need to be built, trained and deployed. The company's data warehouse may need to be remediated to get the product detail in this case, get the product details necessary, and blah, blah, blah about data governance, yes. That's going to be required for quality and assurance compliance, if you will, and data security. Operational databases will be necessary as well, as will workload management. I love that's going to be true here as well, right, seeing a pattern. And I'm going to get to, after my next two applications I'm sharing with you, I'm going to get to some of the enablers of being ready for doing this to your projects. So, this is fraud detection. The old fact, well, the old fact produced models that were late. The fraud had already proverbially walked out the door, if you will, that makes it much harder to fix and there were a lot of false positives and false negatives. So, we want to get to where we're certain or more certain anyway. So, fraud detection is exponentially more effective when risk actions are taken immediately when you're able to stop the fraudulent transaction instead of after the fact. So, this must be real time. This must be real time. And it must be proactive. The ability to see antecedents well in advance of the fraud and pattern match that to current activity. So, you nip the fraud in the bud. Fraud is much more complicated. Yes, fraud detection is more complicated today, of course. But so is the fraud. And so, it's really necessary that fraud detection be more complicated. Fraud today is carried out a lot of times it's not an individual or small group. It's a worldwide endeavor that's coordinated for a tax at the same time and it really takes some good fraud detection to know that. And must be beyond that human scale, must be into that machine learning. And keep in mind, whenever you're looking at a transaction, whether it's a fraud or not, it's a we're still at the point where it's not a yes-no, it's a probability matrix. And again, I'm going to say there is 100 percent house on this. And so 100 percent house as to whether we're calling this a piece of fraud that we want to do something about what do we want to do about it? Here again, the actions that we end up taking about what we're perceiving in fraud, there are many. Of course, we want to stop the transaction. Well, I say, of course, that might not be an end, of course. So there are many actions that we might want to take. The solution must ingest data in real time, the streaming data ingesting data as the generator produces them, analyzing the information in real time and sending outlier anomaly detection alerts to the machine learning which will decide, okay, what do we do about this? Now, the final project I'm going to share with you is supply chain optimization. And for those of you that have a supply chain, you are definitely doing this. You didn't just put up your supply chain and say, come up, May, we're never going to look at it again. Of course, but the question today is, are you using machine learning effectively in that supply chain? So in the old stack, we looked at one channel at a time. We did not include downstream processes and operations and all the various dynamic layers that there are on supply chain optimization. But now we're able to do multi-channel demand, real-time forecast and inventory tracking. This is all necessary to prevent gaps and breaks in the supply chain and a company needs to conduct deep analysis that leverages both the hot data, the data coming in originating in real-time from multiple plant floor assets as well as supplier data and colder data. So in the technology stack, a strong platform is critical in forming a historical data baseline. You may need to be building up the data over time so that you have that historical baseline to develop your models from. Developing models from that historical data and then deploying the models into the supply chain. The solution will enable creation of a holistic view across both the data of electronic systems and the whatever you are, your semiconductor manufacturer will say semiconductor components they contain, allowing for quick identification of the root cause of any supply chain defect. The data in the warehouse will need to be streaming of course. We'll need that operational database. We'll need data security workload management and data governance in order to get it all done. So those are some things to do and hopefully I dripped out a little bit of the benefits that there would be for moving these projects to machine learning. Are there more? Of course there are. There's probably a good 100 profiles of projects that could use machine learning maybe even within your organization. Let's talk about some of the foundations for getting there. These you can do anytime even if you're not ready to move those applications up. Get your data scientists. Part business analysts, part high-skilled program or high-level suggestions and industry and company domain expert. Yes, all of the above. They're difficult to find. The true ones are difficult to find. They're going to go deep. They're not just doing superficial analysis. Now I have no problem with superficial analysis. By the way, that must be done as well. And I have no problem with people that do it. And as a matter of fact, I have no problem with people that do it actually becoming data scientists over the course of time as long as they take a different approach to the job. And I think that's great actually. So it's, but it can be a lengthy non-linear recruitment process, difficult to retain and the top jobs, high-skilled data analysis and interpretation, data architecture, data modeling, and AI and machine learning. That's going to be the top job of the data scientists machine learning to make sure they have that skill and they're growing that skill all the time. Another enabler is the Data Lake. Now I spent a whole webinar on the Data Lake. I won't do that today. But it is the Data Sciences workbench and staging for your data warehouse as a secondary role. But it is where a majority of the analytics are being done by data scientists today. I have a graphic for you where we have today, of course, we all have our data warehouses maybe more than one up and up and running and we have our analytical applications there and we've got most of our users on the data warehouse. But we have the Data Lake over here as well. And now we're starting to see more analytic applications being built on the Data Lake and although there are fewer users fewer data files, this is going to grow. And how's it going to grow? Well everything's growing. So it's not like, let me build this out, it's not like just the lake is going to grow and the warehouse is going to shrink. I don't see that. I see over the course of the next five to ten years which I call the actionable future which is the future that we're taking actions on in regards to today we're making plans for this that the data warehouse is going to grow and the Data Lake is going to grow, perhaps into multiple lakes and so on, as long as they're working together. As a matter of fact, speaking of working together, a very important concept here when we're talking about the warehouse in the lake is the Lake House concept. So the Lake House means that the Data Warehouse can be that primary point of interface. We can put a looker on the warehouse and it can reach over into the lake for the data it needs. It's not in the warehouse. And furthermore, some of the analytics that we build in the lake could be shared back into the warehouse. So it's a little ecosystem we have going on here and it's going to get to the point where they are truly merged and where the users be it analysts or scientists don't really, shouldn't really ultimately know or care what's going on back there but what's going on back there today which I think is another key for is the idea of the Lake House. So this is all going to grow. We need this data for upgrading our applications to machine learning. We need all this data and we need all these domains. You'll need many data domains and you need the master. I'll talk about mastery in just a minute here but these are some more example machine learning applications in marketing, cyber security, smart cities, etc. And what I have over here on the right are some data domains. Now these are just some typical ones. Obviously my list could go into the hundreds if I considered all industries and so on but and I've done this exercise many times for my clients where I will have, I will list out their applications that are on their roadmap obviously it wouldn't cross industry here but whatever they're going to be doing and cross-referencing that to the data domain. This highly informs your strategy. It highly informs your roadmap. What are you going to build next? What data domain are you going to build and master next? Now a lot of us have customer mastered to some degree. We have product mastered whatever that means to you. We have that mastered to some degree but now we're at the point where we're getting into more. Leading organizations are beyond mastery of customer product. They're into all these things and more. So hurry up and get your customer product mastered because just about all the applications that I showed you today and that you know need ML in your organization are going to need them and they're going to need more. So do this exercise. It's very eye-opening and what do I mean by data's ready? We need to get all that data ready. Well, here are my bullets for when is data ready. It's in a leverageable platform. It's not in a data mark under somebody's desk or a data mark that is not shareable within the organization. It's an appropriate platform for its profile and usage. One of my big themes is get the data into the right platform to succeed. You can't put all data everywhere. You can't put all data in the lake. Well, you can do that actually. Obviously not all but a lot. You can put most data in the lake. You cannot put all data in every warehouse, every mart, every this, that and the other analytical structure that you have out there. What's the point of that? They're going to have to work together and you're going to have to make judgment calls along the way of what goes where. And that will have to do with the profile of the data. Size, complexity, usage characteristics, et cetera, et cetera. So data is ready when it has high non-functionals availability, performance scalability, stability, durability and it's secure. And we could spend a lot of time talking about those. Those are very important. Performance is very important. We want good out-of-the-box performance from whatever we do. And that's why the second bullet is there. Get that data into an appropriate platform. Data is captured at the most granular level and data meets your data quality standard as defined by data governance. Not some arbitrary standard, not some William said it has to be at 90%, whatever that means, standard, but what makes sense for you for the data in that platform? And don't say 100%. Because that's hard. That's really hard. You're going to need data science modeling. The ability to evaluate various models and algorithms. Here's your little starter set. Okay, classification, clustering and regression, but there are many others. And I know that it's possible for machine learning to select its own algorithm today, it's getting there. But I still say that we need to understand that process of selection and be able to make that for one's self. Data science modeling means tuning parameters, iterative experimentation, data preparation and you may discover additional data needs or data quality issues along the way. Unfortunately, it's still a back and forth process. I don't know of any data scientists that don't have, let's say something to say about their data infrastructure that they have to live with. And I will just say that the less complaints, the better as far as I'm concerned. I want to build data platforms to allow them to do their day job not their data job which is somebody else's. Finally, I will say ML Ops enables ML for those applications. ML Ops draws on DevOps principles and practices but it's for machine learning Ops and there are some special things about it and without going into the tactics behind ML Ops, what it delivers is continuous integration and delivery collaborative development of business value focus and governance by design. I have a white paper out there on this if you want more on this in particular. But I encourage you to be a leader and shoot for this. I haven't gone 1, 2, 3, 4, 5 on this thing yet. But this is a drill in on machine learning maturity. We're here to talk about that, right? So these are my five categories, analytic strategy, analytics architecture, analytics modeling processes and I threw ethics in there. We have to have that today. It's important and sooner rather than later. This is something that even I think the businesses will pay a price for if they ignore it today. So when you get into this and get into it with an ethical foundation, the other four are obvious, okay, right? But let's see what I'm saying is what a leader is doing in machine learning. And I say shoot for this because I'm not really expecting everybody to be mature out there on this today. And like strategy, I won't read all this but I'm going to point out a few things. Have multiple data scientists on staff. Multiple, not one. I'm talking about an enterprise here. If you're a midsize organization or below, you have to take all this with a grain of salt, I suppose but you're not out of the water. All this is necessary just that may be a lower level. New team members are brought up to speed in weeks, not quarters. Yeah, I mean the data scientists. You have that great of a data foundation in place. Analytics contributions to all major projects is considered. It's not an afterthought. You consider how analytics are going to overlay every project at the beginning and you do that. You have a central catalog to track all the models. Like I said before, these organizations are trading in models et cetera, et cetera. It's hard to make manual errors because you have safeguards in place within your machine learning of model development. Logic within the analytics is transparent to a reasonable point. Obviously at some point you're giving it over to the algorithm but you know generally what that algorithm is doing. Output from analytics is predictable and consistent with auditable outcomes. Models are reproducible moving right along. Analytics processes, access restriction is applied to models. Now you're overlaying security on the models of the organization. And let's see what else do I want to point out here. Analytics applications are monitored for operational issues. They're not just sent off to run and left alone. And finally in ethics I'll say good faith attempts to remove bias variables from models. The potential for malicious use of analytics is considered in the analytics life cycle. So very important here we could probably talk a while about that but it's one of my five categories for mature ML and this is the end beyond. I don't know that anybody's necessarily here but I'm putting this out there to show you that there's this other mountain beyond the mountain that I just showed you. And it's going to change. This is going to be necessary for business and I don't know how many years, maybe 2022, something like that. What you're looking at here today. So this is what you got to be shooting for. This is fundamentally different than two years ago due to machine learning. Again it's that we're a machine learning company now mindset. Machine learning is driving company initiatives etc etc full code reviews. Machine learning can be deployed from anywhere that's automated into in machine learning life cycle support. What else do I want to point out? A new algorithm approaches at full scale. Visual model configuration changes. And finally in the ethics category cybersecurity experts are engaged in machine learning operations. And machine learning systems are protected as model transparency into an audit trail for machine learning only fully vetted models are ultimately used in production. So these are some again like this is foundational to get to those applications and you'll have to decide when you're ready to move but work on your specific challenge. Hopefully I've given you some specific challenges here today that you need to work on as an organization. Now in my quadrant here I have your skills in house and your readiness in house okay. So if you have trained machine learning people and your organization is ready for machine learning you have some of these foundational elements I just talked about in place you probably need to continue to work on your DevOps and your machine learning apps. That is something you can continue to grow and your organization is fully there yet but you're in the ideal state for today. But if you have trained machine learning people but your organization is not ready you have to work on that organizational readiness and by the way hope that the trained machine learning person is patient enough to stick around for that. Okay. No trained machine learning people? Well if your organization is not ready for ML as well then the lower left quadrant also known as the world of hurt okay. So it's time to grow your organizational readiness and grow your machine learning skills but if you have an organization that's already ready for ML minus the skill part okay. You need to grow those ML skills because indeed there's a lot you can do before the scientists come on board getting your governance straight getting your security model straight getting that data lake up running with detailed data getting your data architecture tight etc etc a lot of the things that I talked about here today getting many data domains ready or as I like to say under management. So there's definitely a lot of things in here that you all could be doing I hope you are doing some of that I hope it shapes what you are going to do and you start to think about these things as you go forward improving your analytic data architecture maturity with machine learning and with that we have a few minutes left for questions so I'll turn it back to Shannon. Well Ian thank you so much for this great presentation as always just a reminder to everybody to answer the most commonly asked questions I will send a follow up email to all registrants by end of day Monday for this presentation with links to the slides and the recording along with anything else requested so diving in here William early on in your presentation why do you introduce data governance in level five only data governance facilities facilitates and helps why do you introduce data governance again in level five it facilitates and helps to think the efforts in all other data management disciplines. Oh it certainly does but this gets back to how I developed the machine the maturity levels I looked at companies and what they were doing and how successful they were and they fell into five groups and that fifth group is the one that had the great data governance and the third and the fourth had data governance of course but not to the level of you know fully across all data domain deep it's everything cataloged and defined and all the rules formed that are actually in use in the organization that's what we call full data governance and that is so hard to do that's why it doesn't come into the model until what comes in at three let's be fair but we don't see it fully done until level five and it's just because it's so hard and companies are you know still working on it and I definitely believe that it should be should be done ASAP I mean that's one of the things that you know if I'm asked to do a data warehouse or a data lake or whatever kind of project for a company we get that data governance in place at least to the degree necessary so it can serve that project and hopefully from there it can launch and launch across the company and so I definitely think it's important not diminishing its value at all but that is where it just seems to come into place I don't think so I mean I really appreciate the question because I think the idea that data governance is a muscle we all just need to have at all times is just really important right I'm sure it comes into to level five but that's just the thing we need to be thinking about every step of the way all the time it just becomes increasingly important as you get a little bit more mature sure so when do you guys recommend Hadoop architecture instead of a traditional architecture well today honestly I'm only recommending Hadoop where Hadoop already is in place and it's doing a good job and it doesn't make sense to change that right now and where it's inside of a good tight data architecture if any of those things are not in place we're looking at cloud storage perhaps instead of Hadoop and surely anything that's going into into place today we're looking at cloud storage for that workload that used to be the domain of Hadoop if you will so not a lot I would say you know really anymore I would just echo that William we are not seeing customers that are actively installing new Hadoop deployments at this point mostly what they're doing is maintaining it Hadoop and MapReduce and this ability to be massively parallel processing a database essentially changed the universe of database design but the first iterations of it and the true Hadoop architectures have been proven to be a little bit hard to maintain because you have to scale the hardware individually or you have to scale maybe if you're doing Hadoop in the cloud you have to scale cloud hardware still and you see these massive cloud databases you know big query snowflake Amazon Athena these big databases they're so elastic and they grow so easily and they do all the same things that MapReduce and Hadoop would do but you don't have to spend a lot of time babysitting and so that's why I think we're seeing less new Hadoop and people are essentially maintaining their existing Hadoop databases without necessarily expanding them the management of those systems is a little bit onerous and it doesn't add a lot of value to have the skills of being able to manage it what you really want to be able to do is focus on where you're going to drive value for the business on delivering insights where they're needed I think we've got time to slip in one more question here back to the topic of data governance so what are the various ways to implement well I like to do it on a subject area by subject area basis on a priority basis I will look at something like this hopefully you've seen my slide and you're looking at something like this and figuring out what's priority here I like to get data stewards assigned for all major data domains though even if they're not going to be super active initially and I like to bring them together and discuss data governance, discuss the rules, discuss the program and really you're going to have just in the setup you should have a good six meeting agendas in your pocket for a group like that who are the subject matter experts within their domain and it's very instructive I think to the roadmap of governance and to all project to the project development roadmap as well I like to do it by subject area with data stewards representing their subject area they come from the business it's not a full time role in my book it's something that they do because they're actually users of the data so this isn't something that I want to see a company hire for I want that person to be in place and understand the data, the problems, the good things about it and have some ideas about moving it forward and bringing that to the table and also you know data governance means a lot thanks to a lot of people so I'll just quickly add that you've got good data governance tools out there that we are starting to use more and more and see more and more you know and these tools are great for data governance rules and what we're seeing now it's a new era of this because now what we're seeing is those rules can go can be blasted out if you will into all development of the company especially in data integration so data integration and data governance tools are tied at the hip and you can put your rules in there and you can ensure now that those rules will be followed even if the data integration rules don't specifically say to do it this way so that's making data integration a lot easier yeah so William I'll echo that I'm a little bit more of an infrastructure head on my side too so one thing to think about when you're talking data governance is just how complex or simple your data architecture happens to be from an infrastructure side something that's simpler something that's consolidated fewer steps a good example is this idea of the lake house if all the data is in one place if there's fewer steps to get it there fewer places where it's extracted to then suddenly all of governance is going to become easier and auditability is going to become much more simple so there's no easy answer to this problem the same way when someone says how do you do you know data security you're like well there's a lot to unpack there but one thing that makes both governance and security monitoring and auditing easier is just having a simple architecture and removing silos removing isolated componentry and modernizing to maybe a more consolidated environment those sorts of things are going to drive more successful data governance better data security posture etc well thank you both so much for this great presentation and your time today really appreciate it and thanks to looker for sponsoring today's webinar but that is all the time we have for today thanks to all of our attendees for being engaged in everything we do thanks for great questions I hope everyone has a great day and stay safe out there thanks all thanks Joel thanks William thank you thank you