 for joining the latest installment of the Monthly Data Diversity Webinar Series Advanced Analytics with William McKnight. Today William will be discussing what the aspiring or new data scientist needs to know about the enterprise. Just a couple of points to get us started due to the large number of people that attend these sessions, you will be muted during the webinar. For questions we will be collecting them by the Q&A in the bottom right hand corner of your screen or if you'd like to tweet we encourage you to share highlights or questions via Twitter using hashtag ADV Analytics. And if you'd like to chat with us or with each other we certainly encourage you to do so, just click the chat icon in the bottom middle of your screen for that feature. And if you'd like to continue the conversation after the webinar you can follow William and each other at community.dativersity.net. And as always we will send a follow-up email within two business days containing links to the slides, to the recording of the session, and additional information requested throughout the webinar. Now let me introduce to you our speaker for the series William McKnight. William is the president of McKnight Consulting Group. He takes corporate information and turns it into a bottom line producing asset. He's worked with major companies worldwide, 15 in the global 2000, and many others. McKnight Consulting Group focuses on delivering business value and solving business problems, utilizing proven streamlined approaches and information management. His teams have won several best practice competitions for their implementations. He has been helping companies adopt big data solutions and with that I will give the floor to William to get today's webinar started. Hello and welcome. Hello Shannon and hello everyone. Thank you for that introduction. Welcome back for those of you who join me every month at this time. We are rolling on the second Thursday of every month at two o'clock eastern and we'll be going right through 2020 with some more topics all about advancing analytics in the enterprise. My passion and I trust at least in part your passion as well and I want to share some of that with you today because there is an emerging role out there that has a lot to do with this and that's the data scientist and there's some confusion and there's a lot of different definitions of this and I wanted to cut through the fray and help you navigate that complicated landscape and get data science up and going in your organization with a data scientist. Forbes says that 85% of the US CEOs and business leaders are AI optimists. 87% are investing in AI initiatives this year and 82% expect their businesses will be disrupted by AI to some extent within the next three years and I see the data scientist as being really critical to that disruption. Only 29% of companies are making use of artificial intelligence today, regular use that is. 41% of organizations are playing to invest at least 500,000 to support AI initiatives over the next 12 to 18 months. The federal government is spending a billion dollars on artificial intelligence this year. Indeed, they are a huge user of artificial intelligence. Now I have also found in my in my walk that venture capitalists used to look for to make sure that everybody had a big data strategy and whatever they invested in and then it was the cloud making sure that it's oriented to the cloud because that's where things are going. I definitely agree. But now I've noticed that they are looking for artificial intelligence as well in the things that they invest in. So everybody's got to have that kind of a strategy. It doesn't matter if you're opening a restaurant you have to have an artificial intelligence strategy. You have to be ready to answer the artificial intelligence question even there. But despite the fact that this is important, nearly eight out of 10 enterprise organizations currently engaged in AI and ML report that projects have stalled and 96% of these companies have run into problems with well guess what the things we do right data quality, data labeling required to train AI and building model confidence. So a lot of it has to do with data as you'll see as we go throughout here. I'm going to say that the data scientist is very important in curating the data environment of an organization. They're going to need to be successful. Now this talk, let me back up here, it's what the aspiring or new data scientist needs to know about the enterprise. So yes, I'm talking to you if you're a data scientist. I'm talking to you if you're aspiring to be a data scientist. But I'm also talking to those of you out there that are going to be hiring or have hired the data scientist. This is what they need to know. And it's what they need to know about the enterprise. Of course in this next hour I'm not going to share all kinds of data science type information in terms of the tooling and things like that. But there are things that I have noticed in my consulting that companies are forgetting to train their data scientists on and even bring into awareness. And as I reviewed this presentation yesterday, I was looking back and saying, you know, a lot of this content is not just for data scientists. It's for anybody that steps into a new role in data. So maybe you'll feel that way about it as well. But data science pioneers are locking in. Locking in right now. Right now is the time to lock in and get the exponential returns on investments in this important area of artificial intelligence. If you agree with me that it's important. And if you follow these webinars throughout the throughout the year, I've had a lot of talks on artificial intelligence, artificial intelligence, machine learning, NLP, the data required for AI. I focused on all of that because I think it's just so important to you. Data science pioneers out there today. What do they do? They let the data speak. The data speaks for itself. We're getting away from the idea of the hippo, right? You know about the hippo, right? The highest paid person's opinion. Not enough anymore to carry the day. They are using statistical models and machine learning. And they're generating deep business implications to work, not shallow ones, but deep ones. And they're now dealing in algorithm management. Yes, despite the fact of the statistics that I mentioned before about the high interest, but also the high challenge with getting to AI, there are those who have broken through and are getting into this first wave and I think are going to have exponential benefits as a result. And here are some of the things that the data scientist can aspire to do within an organization or have aspired to do and already have done within some organizations. These are some areas you can think about. Obviously, you may or may not find your way into one of these, but hopefully there's some derivative of some of the things that you see here. Fraud detection, chatbots, in-car navigation, all sorts of transportation implications to artificial intelligence. Reduced cost of handling misplaced items, just automating things, predicting the future, whether it's flight laser, whatever it is, it's important for you to be able to predict. A lot of artificial intelligence is about predicting the future. It's advanced analytics. And in advanced analytics, I actually had a webinar on this, but it's important to go all the way to prescriptive information, not just be able to predict that future. So here are some of the pioneers, and hopefully you'll be one of them sooner rather than later. This represents profound change requiring commensurate strategic focus and urgency. This should disrupt the current thinking process and can produce high-impact enterprise outcomes now. I'm fortunate to be a part of some of that. Now, there's a grayscale, to be sure, between analytics and data science. Data science is being a means to deliver analytics to the organization at a rapid pace, a more accurate pace, and a much higher, I said more accurate, but involving a lot more information, which makes it more accurate. These are reports, correlations, predictions, recommendations, and interactions. So we're moving from what happened to why it happened, what will happen, how can we make it happen, and influence what happens. Now, last month I talked about predictive versus prescriptive. If you want more on this information, I can go to the archives and get that. But data science is now on the data usage spectrum. So it used to be we did reports, right, BI, good old BI, good old reports, then we sort of advanced to dashboarding, which is a kind of a highly summarized curated form of reporting. It's interactive, it's a huge step forward. We put some of that on the web. We did some streaming data into those dashboards to make them more real-time and things like that. We did visualization then to even take it to another level and say it doesn't have to be a dashboard. It can be many things, many ways, many form factors that we could represent data within the organization. But now a way to represent data, if you want to think about that way, is data science. And data science being a means to derive highly insightful insights into the organization. And it does this very fast and mostly without coding. So we're and another thing, I don't know if I have it in this presentation or not, but another thing that I say a lot is wherever you're thinking about BI, think about AI. Why am I so highly focused on AI? I thought we were here to talk about the data scientists, right? Well, I'm building up to this thing I'm going to say time and time again, I'm sure throughout my talks. And that is that artificial intelligence and machine learning is the top job of the data scientists. That is what a data scientist should be doing, not just more analytics, not just using BI tools, but getting insights through artificial intelligence. Helping the company make that sort of leap is my aspiration for anyone who calls themselves a data scientist. Now, it's a varied role. Let's talk about the role. Part business analyst, part high skilled programmer, high level statistician, and an expert in the industry in the company, company's domain. Now, many data scientists I've noticed are hired having some great credentials, and we'll talk about that in a minute, but they're not told about the enterprise. And so there's a disconnect because a lot of data scientists come in a couple different flavors. Actually, I'm going to get into this in a bit, but sometimes they're fresh out of school, or maybe they've gotten themselves retrained, but they were never really in an IT position before, but maybe they got a PhD in this. But they're well trained, but they're not involved in... And when I say IT, I just mean it kind of loosely. I know there's pockets of IT all over the place. I don't just mean the central IT, but I mean a technology role within an enterprise. If you're in that kind of role, or if you haven't been in that kind of role, there are things that you need to know to be successful. And indeed, I think the pressure is on the data scientists to deliver, and they're going to have to balance a few things. They're difficult to find. There's a lengthy non-linear recruitment process. I don't want to dissuade you, but if you're stepping into this, these people are highly valued, and the recruitment process can be a bit of a sinkhole of your time, if it doesn't result in a win here. They're difficult to retain as well, not just hire, but also retain. The top jobs that they will do, or they should be doing, high-skill data analysis and interpreting, data architecture, you might say, well, why? Why data architecture? I thought that was somebody else. Well, most data architectures are not up to the level that they need to be, and a data scientist needs to be able to guide some of that direction, if not do it themselves. I'll get into this. Data modeling, but AI and ML, that's your top job. Now, you can clearly tear up your data scientists within the organization, like maybe have a key one or a chief one, and have multiple, large organizations are going to do this. Most organizations, I would say, are probably, probably have one. Even big organizations sometimes only have the one, and then they have lots of data analysts, maybe, that are aspiring to join that. Data science certainly would have to prove itself for that to become a big thing in an organization. But I'm going to, I'm here to start that process of helping you make it successful, but let's work on the specific challenge that you have. And you have come to this webinar from one of four places I'm going to suggest. Maybe you're in an organization where you have data scientists, or you're seeking data scientists, or you think about a data scientist as someone who's trained in data science. So that's sort of top down. But your organizational readiness is the other key factor here on the quadrant. Some organizations are ready for a data scientist. They've done some of the things I'm going to talk about in this session. They've prepared the way. But others are not. And I put that in red because I think that is really key here for an organization to be successful with a data scientist. It's a two-way street. You can't just drop them in and say, okay, away you go. Make something happen here. You have to pave the ground. You have to fill the gaps of what they may or may not know and understand about the enterprise. And that's what I'm here to do is to fill in some categories that you may want to help that data scientist out with. On the other hand, if you're the data scientist, you're not getting those gaps filled by your organization in order for you to be successful. At least you're going to have the awareness of what's going on in the enterprise. And hopefully that will help you. A data scientist may also come from an analyst position. I've seen this many times. This is sort of the fake it till you make it. And I don't mean that pejoratively. I mean that as a compliment. If you've been able to fake it, I don't mean that again in a pejorative way. But if you've been able to make it from a position where you do not come in with the PhD in data science, if you're really doing these things, okay, I know some organizations hand out the titles here. But if you're really a data scientist and you're really doing the things I'm talking about here, then congratulations. And an organization can certainly thrive with that kind of role. I'm not here to dissuade that kind of data scientist, but that data scientist is going to have some challenge around growing data science skills and maybe even with the organizational readiness factors. So these are your quadrants. Where are you? I should probably do a poll now, but you do it yourself. Figure out where you are on this quadrant. Is your organization ready? And are you hiring data scientists from bottoms up, from growth, from an analyst position or top down, bring in somebody that's maybe fresh off a PhD program in data science and looking to find their way in your organization? Backgrounds, all kinds of backgrounds. These are the best backgrounds. So basically it stems stuff, right? Physical scientists, computer science, finance and economics. We certainly have our fill of people from those backgrounds that are doing well in data science. And by the way, I do a lot of data strategy work for organizations, people process technology across the board, looking at all their data platforms and so on and looking at their people and recommending where they need to go. And in more times than not in the past, I'd say probably in 75% of the cases in the past couple years, I have recommended that organization either get more of or most of the time get their first data scientist aboard. So I've had some experience at helping companies board their data scientists, looking at backgrounds and things like this and also being part of the full life cycle of things. Seeing them come into the organization and seeing various degrees of success over the course of time. So that's a bit of what I'm sharing with you today. Going a little deeper here, what are the skill requirements for AI and the data scientists? Now I'm starting to put the slash in there because as you are now aware, I am saying that artificial intelligence is job one of the data scientists. So to me, you have some growth if you're a data scientist, but you have not gotten into artificial intelligence. It's not high end data analytics. We need more than that today in order to put a stake in the ground around competitiveness. So here are some of the things that I look for that I hope you look for math and statistics. Python, yes, that seems to be a fairly ubiquitous language and not that really, not that difficult. Maybe some exposure to the libraries. And then there's TensorFlow, which is becoming very popular. This also are in MATLAB. You can get away with that for sure in organization on Java and scale up a little bit of an old school approach to data science, but still workable within an organization, even scalable into the future. I don't have a problem with this kind of skill set either. It works well with Hadoop and Spark. If you are so inclined and committed that way, great. And soft skills, which I'll get to in the second here. A data scientist needs at least a basic understanding of statistics, including distributions, likelihood estimators, and familiarity with statistical tests. Needs to understand when different techniques should be applied. Multivariate calculus, linear and matrix algebra are good to know, plus the programming and database skills that are mentioned on the slide here. A postgraduate degree is preferable. There is a lot to learn. Postgraduate degree does indicate a high degree of ability with this stuff. It's not absolutely required, but it's certainly a great nice to have. But there are plenty of specific data and now analytics and AI certificates. And there's work experience. But work experience on a single job is not enough. The candidate needs to be able to, or maybe the data scientist, needs to be able to continue their education outside of the organization. Any single organization on the job training is not going to be enough to get to the level that an organization needs out of their data scientist. That data scientist needs to be spending a lot of time in education. And I'll get into that in a little bit. In terms of soft skills, now, ideally, ideally, but not realistically, a data scientist doesn't have to have a lot of these soft skills. They can work in a closet, and blah, blah, blah, and be successful. But let's bring reality into the mix here. In order for that data scientist to get something done in an enterprise of today, these soft skills, which are harder to learn than the soft, sorry, the soft skills are harder to learn than the hard skills are going to be necessary. A business goal focus. Ultimately, and I shouldn't even say ultimately, but pretty quickly, I would say, yeah, we need to get aligned with business goals and deliver to them. Now, I think that in the future, the data scientists will be setting the business goals. And I have no problem with data scientists who are able to aspire to that today. But that's not where most data science is today in organizations. So getting aligned with current business goals is great. Having communication, ability to communicate their results in practical terms to the rest of the enterprise, which usually is needed to carry out the findings of the data scientists. If I have a detailed orientation, they must be curious and creative. Yeah, I hope you didn't think that it's all technical and it's all quantitative. There's a lot of creativity that's required. There's many ways you can take data science, many languages you could use, but more importantly, many algorithms, many things that you can seek out. And you have to have great intuition about where to spend your time. Hopefully you have just a bit of a free hand. And this is another thing we'd love to be able to just give a data scientist a free hand. But if you haven't trained that data scientist in what's going to work within your organization, that free hand is worthless. So that's why some of the things I'm going to be talking about here today are very important in order to enable the data scientists. Especially when working with new data sets or developing new models, data scientists don't always know what they will find. They need to test their assumptions and not be afraid to try different tactics if the initial ones don't work as planned. So a bit of assertiveness probably could have made its way onto my slide here. You need a data scientist who thrives in exploring the unknown and doesn't consider a wrong assumption to be a failure. And that's as much for the data scientist to know as the enterprise to know. Okay, here's some ideas for growing data science. Maybe you're an aspiring data scientist, or maybe you're a new data scientist and really want to ramp up. A lot of times you can feel, or I should say, I see data scientists feeling quite alone in an organization because nobody's quite at their level in terms of knowledge of this. There's no other data scientist around. There's a lot of isolation that can happen. So you can do some things to combat that. Shadow an existing data scientist. Maybe if the organization has a data scientist elsewhere, you can grow into advanced levels of the role by shadowing one. There is plenty of university data science training. There's online education courses right here at Dataversity, as well as hackathons, which make a lot of sense to hone one's skills. Now, myself, I have taken the training of Andrew Ng. Of course, that's an artificial intelligence, but I believe that's job one of the data scientists. And they were very good, but there's others out there. And like I said, the data scientist needs to continue their education. So what is the pattern for artificial intelligence? It's something like this. Hire and grow your data science, which is what we're here to talk about, uncouple AI from organizational constraints while conforming the organization or growing the organization, right? The organization needs to be ready. And I put that in red earlier. Let me highlight that. I put that in red earlier because that's a real problem with getting the most out of data scientists. You can certainly frustrate the data scientist if some of these things aren't in place that I'm going to get to. Compile your data. Yeah, it's a lot about data. I had a full webinar on just that topic, so I won't say too much about it. But I think they're going to work a lot in a data lake. There's going to be internal data and external data coming into that data lake. They need a lot of data to be successful. Or if they're not asking for it, then that might be a problem. Okay. They need that data labeled to build the model prototype, iterate it, productionalize it, or somebody needs to productionalize it. We need to understand the full life cycle of data science within the organization. Who's going to take the models, the findings, from the prototype, if you will, into production? Because nothing counts until it gets into production. Let me repeat that. Nothing counts until it gets into production. And so it's not good enough to work in isolation in a development environment, and you may need to surround the data scientist with the full SDLC, what have you, to get their wares into production. I have data scientists, or I know data scientists out there that are unfortunately burdened with this role. Now, somebody has to do it. And I guess I don't mind as long as the data science gets into production. In my mind, ideally, there's some support structure around to QA and blah, blah, blah, and all the way to production, the data science. So, several steps here about data. Machine learning is not usually about finding the right answer, but finding the sufficiently probabilistic optimal answer. Yeah. Not the perfect answer. You can wait all year for the perfect answer. That's too long. You've got to find a balance in there. The sufficiently probabilistic optimal answer given the data. Given the data. And if not much data is given, not much should be expected. And unfortunately, a lot of organizations are falling short in this area. So, let me give some advice to the data scientists, especially the analysts plus data scientists, okay, the ones that are growing into that role. We're all growing into the role, right? I don't think any data scientists out there except maybe a handful have aspired to the full duties of the role. And some of them we know about and some of them we don't know about because they're doing such great things for their organizations. I want you as a data scientist to be doing that. Here's what I would tell them. There are lots of data platforms. Know this about data architecture. Know what these platforms are that you're looking at here. I'm not going to describe them here right now, but know that there's no one size fits all. Data is all over the place in an enterprise. Take a look at that environment. In a lot of organizations, more than half, certainly there's no what I like to say is there's no such thing as architecture. And that is to the full detriment of that company. That company is suffering because it does not have true architecture. It just does things. It does databases, all right? Maybe it's driven by the business in terms of the business telling the tech team, maybe their tech team or the central team who knows, telling somebody what to do, but maybe they're not informed and maybe they're not thinking about enterprise architecture. It doesn't take longer or more budget to do it right. But what it takes is the knowledge and the focus, the ability to focus. But in a lot of organizations, there's not only what you see here, there's all sorts of permutations of what you see here. Maybe many, many data warehouses, data marts that are generously called data marts, but really are databases that are put in place for a very specific purpose. And so I can get real cynical here, but I'm trying not to. So I'll stop right now. I don't really want to do that. Oh boy, here's another slide that you can get real cynical with. And I'll try not to, but what the data scientist is stepping into in most enterprises is not a nice, efficient, clean, whiteboarded information architecture. And I am not here and nobody's here to say, point fingers and say, well, you know, why did we get here? You're here. You're here. You have a spaghetti architecture. What's the first step in making things better is acknowledging, right? And getting to the seventh stage is a grief about your architecture. Okay, let's move through them. Let's get to action wherever that is on the seven. Let's get there. Let's get past the denial. This is what it is and you're suffering because of it. We have not encountered a hopeless situation, but many are close. Today, I must say, many are close. If your architecture looks like this and you don't care and you're not doing anything about it, you are close to extinction, not just as a data company, but as a company. Treat architecture as an organizational discipline. Unfortunately, most of the time for the data scientist will be spent searching for, figuring out, cleaning up, combining data, doing data integration and things like that. The data architecture of 2019, I believe I've given a full presentation on this. Let me just build it out. It's complicated. And this is a clean looking version that you could put in a book. So, do not think it's as clean as this. There's a lot of components. A lot of the components that I showed you on the previous slide are here. And I think the data scientist, I think the organization needs to draw this for themselves. You might be surprised at how few organizations have a picture of how the data flows within the organization. There's operational side, analytical side. There's some things that cross the stream for sure these days. But there are some specifics to being on the operational or the analytical side. There's the data lake. There's probably several data warehouses, which I'll talk about in a minute. That's okay. That's reasonable to some degree. Sometimes these databases are turned columnar. You have various data marks in place because the data warehouse is not a one size fits all. Anyway, what's your data architecture? The data scientist needs to know. The data scientist needs to figure out where all the data warehouses are. They will use the data warehouse for data. A lot of people ask, and maybe you're wondering, where does the data come from for data science? I kind of said the data lake earlier, didn't I? Well, actually it comes from wherever it is that's accessible. Accessible great data. I'll give you some definitions of that. The data scientist cannot use dirty data that was put into any old database and stuck out there. They need a slight bit of refinement to the data. Not like an end user. We're not talking that level. They don't need a dashboard, but they need to be able to trust the data and so on. That data is going to be in a data warehouse for sure, or two or three. This is what I'm here to tell you, is that many organizations have multiple data warehouses with what I call flavors to those data warehouses. Customer experience, asset maximization, operational extension, risk management, finance and product. Nobody could have sat here 10 years ago and said, well, this is what's going to happen to data warehouses in an enterprise, but this is how they have evolved. Many of my clients who are enterprises have data warehouses, multiple data warehouses, and they may have two or three of the ones you see here. They may have one that spans two of the functions and so on, but this is where data warehouses have morphed. I did a study on this. This is where data warehouses have morphed too. So please note, depending on the history of your data warehouse endeavors, a data warehouse may have multiple flavors. Most guides focus on data warehouse product selection, but today, when many companies are decades into data warehousing, we know that no two are alike. There can be radical differences and this is what the data scientist needs to know. If they're fresh off the training into your organization, they need to know it's not going to be clean. There's going to be multiple data warehouses and they'll come from various of these flavors once again. But the data scientist will spend most of their time in the data lake. It is the data scientist workbench and it also serves as data warehouse staging, which is of lesser consequence to the data scientist. A data lake is a collection of long-term data containers that capture, refine, and explore any form of raw data at scale enabled by low-cost technologies like you would employ it today in the cloud, in cloud storage. Today, data lakes should be in cloud storage. I had a whole webinar on this. I won't explore that here, but seek out where that data lake is. And repeating a refrain I mentioned earlier, when I do my data strategies for companies, yes, I come up with a data scientist, but I also want them enabled. So, yes, I'm also going to strategize and architect a data lake for most of my clients because that is something that they're going to need in order to aspire to great data science. So, Mr. or Mrs. Data Scientist, you'll need many data domains. Some of the things that you'll want to do for your company is on the left here. Of course, that's a very short list, but think about, and I haven't done it here, but think about all the data domains of your business. There are these and more in most businesses, and then each business has their uniqueness. Yeah, every business goes off in different directions from here, but think about these objectives and how much data that they're going to need. And this is where I'll say it'd be great if you had master data management in place and in my architectures, master data management feeds data into the data lake. Yes, I know it's small data. Yes, I know it's highly refined. I know it's relational. But you need this kind of dimensional data in the data lake where the detailed data is in order to make sense of that detailed data. The data scientists will want this type of master data, and I'd say probably half the time it's not going to be that clean at finding where this data is. And so that's why I say it's a bit of a patchwork approach. Data is going to be in the data lake. I'm trying to centralize the data in the data lake for the data scientists, the data lake in the data warehouse. All right, I'm not going to put everything in the warehouse into the lake and definitely not vice versa. But if we can contain the data into those vessels, we are going a long way. Here's what the data scientists need. Are you prepared to deliver this to make that data scientist maximally successful? All right, management of data sets in both learning and inference. To be able to manage identities and access controls over data sets models and insights, to use CPU and GPU, the operational monitoring of ML jobs for efficiency and quality of insights, transparency and trust to build confidence in the results. Data scientists need platforms and tools that enable them to work as efficiently as possible, including these features. This is what the data scientist is going to need to develop out of the data that is provided to them. And as a result of such capabilities, data scientists can add the benefits of continuous learning. The faster that scientists can iterate through tests and learn cycles, the more quickly they can arrive at genuine insights. So the data, the data that they need, the data needs to be in a leverageable platform, not something that was kind of slapped together or need under the gun and kind of stuck it out in production. That's not good for one youth. For that one youth, it's probably built for that one youth. The data scientists will have 20 to 50 to 100 uses for that data. It has to be in a leverageable platform. And in my mind, that's the data warehouse or the data lake or master data management or an operational data hub, something with some serious non-functionals applied to it, some serious thought done by data professionals. That's the kind of platform that the data scientists can devour. It's an appropriate platform for the profile. It's not everything is in, name your favorite, you know, operational database, even analytical databases, right, should not be in those, those databases. Things that should be fit for purpose in platforms. And that's the whole essence of that, that architecture that I showed you a few slides ago. With high non-functionals, availability, performance scalability, stability, durability, and security. You want your data scientists getting data out of platforms that don't provide that? That's going to be certainly hampering the benefits. Data is captured at the most granular level. And a lot of history is made available. Up to the point at which legal says now we want to get rid of that data, which is seven years for a lot of my clients, but whatever it is for you, the data scientists can use that data. Data is not a data quality standard, as defined by data governance. And this is where a lot of architectures fall short. The data is not to a quality standard. Data quality was not applied to data. We just locked and loaded files. Oh, I'm getting cynical again. I'll stop. But I'm trying to describe an environment where the data scientists will have some challenge. When you have not applied any kind of rigor to incoming data, and I have clients that do a lot of third-party data, even that data, even that data, heck, mostly that data, must be at a data quality standard. And speaking of the data quality standard, again, for Mr. and Mrs. Data scientists out there, the data that you encounter may be of suspect quality. And that means it lacks one of the things that you see here. And I say may, that's kind of putting it modally, you will encounter some less than ideal quality in the data. So you're ready to go. You're rubbing your hands together. You got your tools and so on. But boy, the data had better be ready. It better have some referential integrity to it. You can't be dealing with data with customer number 123 when there is no customer 123. And that probably, you can probably find 100 places in your data where that's a rule. And you don't have to enforce the RI. I don't want stay too long on RI. But you don't want to necessarily have to enforce it through the database. But you better have it in the data, at least to a high degree. Uniqueness where you want uniqueness, cardinality must be appropriate. The subtype, supertype rules, right? Value reasonability. It's not appropriate to have someone who is 250 years old. It just doesn't happen. Not appropriate to have a customer that is five years old in most businesses. Consistent value sets, formatting must be right, data derivation, completeness, no holes, no holes in the data. Here's a lot of data since 2010, but 2014 is missing. That limits the value of all the data before it. And by the way, this is where I'll say communicating data quality through metadata is a great idea and will help that data scientist. Informing the data to a clean set of values. Now, again, advice to the data scientist. Don't hammer on this until the data is 100% quality. It never will be, but it needs to exceed a standard. That is for sure. And the other piece of advice, I don't have a slide on it, but I will mention to the data scientist what the status of data governance is in the organization. I hope I didn't get too loud there. The status of data governance in the organization. And know that that is a body that you will have to deal with. And I don't mean to put it like you don't want to. You want to. You want the organization to have data governance because it will help you supposedly look at high quality data. Somebody will need to focus on the non-functional requirements. Somebody. I hope it's not you, but it might be you. And you have to decide when you step into the job is, what do I mind doing? What am I good at? What do I want the organization to grow so that I don't have to do this? And am I willing to spend time on this when I'm trained to do something else? And that might be a higher calling for my services within the organization. Yeah, you can say that. You must provide, though, somebody must provide in the full life cycle, again, since it doesn't count until it gets to production, scalability, performance, reliability, scalability, maintenance, capacity, security, blah, blah, blah, all this good. And security is huge. I don't think a data, few data scientists I know are data security experts. But yet we must surround their work with great data security as we know today. So, again, they can't work in isolation. They have to work in conjunction with the rest of the organization used to describe the quality attributes of the system and the constraints which the design options will be required to satisfy in order to deliver the business goals. Requirements assist in determining size and cost and viability of the proposed system. So, get non-functional requirements or somebody must get an execute on the non-functional requirements of data scientist science within the organization. Now, I didn't say ROI on here. I probably could have. Where does the ROI come from from what the data scientist does? There are companies out there that are frustrated with the lack of or the perceived lack of ROI out of the data science. To some degree, it's fair, fair to expect ROI, but it may not be fair to put that all in the data scientist. Hopefully, the data scientist is not performing kind of an accounting finance role and looking for the ROI of the solution. They should be working on things that are breaking higher ground and getting into new products and new customer sets and new insights that are, you know, the ROI is sort of boundless, right? But let's deal in reality advice to the data scientist. This may come up, but you need to know what, how much of that you need to deliver and who's taking on the rest of it and taking it full life cycle. So, yeah, I'm speaking to the data scientist now, as you can see, as if the data scientist is forming a little department, full life cycle department within the organization with a lot of interfaces and that is true. That is kind of the way it is. That's the advice. Okay, find an application to ride, to ride the production. The data scientist will get a budget, but that budget will not be unlimited and it won't be around forever if there's no production and you need to know where the production of your work is coming from and where it's going into it within the organization. A little goes a long way. Not everything has to be a home run either, but you better be hitting some singles sooner or later that drive to those business goals that I was speaking about and translating how that turns into the bottom line for the organization because when things get tough, the ROI discussion gets a little bit louder within an organization. So, advice to the data scientist coming into an organization blind, organizations don't, organizations exist to make money. They don't exist for data science. They exist to make money and you better not forget that. Another thing I would say is organizations are trying to run as agile. So, how do things happen within an organization? How do things get to production? How is development done? Well, agile is what all organizations really to some degree or another are aspiring to. I used to have to fight this as a battle within organizations. Yes, we're going to do agile. You know, my team is coming in. We're going to do it as agile. Is that okay? And the answer is yes. And let's look at how you're doing agile. Maybe there's ways you can improve it across the board and that goes a long way. You know, it's not just what you do. It's how you do it within the organization. What you do might be great, but if you do it slow, if you do it disconnected from the organization, and if you're not delivering pieces along the way, if you're doing a big bang and the organization will see things in six months to a year out of you, that's not good enough anymore. That's not how what corporations value. They value agile. So, know a little bit about agile roles, sprints, commitments. You don't have to be an agile expert. You don't have to go get a belt with a color or whatever, but you have to know some things about agile. And you may be on or lead an agile team. Who knows? It might go that way. You may want to run your data science organization as agile. Again, I'm not trying to pull you into the minutiae of the day-to-day. I wish you could go off and work on algorithms and come out with great insights that just knock it out of the park, but that's not the reality. And that's the bottom line of this presentation, is reality. And dealing with that reality and still being successful. How about this? How about this? Organizations will need to do organizational change management to get your work accepted. Not everybody within the organization is sitting there with open arms, can't wait to see what that data scientist does and can't wait to embrace it. Not how it works. Organizational change management is going to be required for that organization. So, a question I would have is who's going to do that? Maybe you. I hope not. But again, you have to decide. Today, I'd rather the organization have OCM involved in all projects, including what you're doing as a data scientist, and be fully up to speed on that and doing it and blah, blah, blah. But it may or may not be reality. So, some OCM tasks may be done at a release or product level. So, I'm using agile terms here, scrum terms. And those tasks are stakeholder management, training, and future state job roles. Others may be done for epics and features. And this is slightly bigger than, or slightly smaller, maybe quite a bit smaller than a release or product. Deployment readiness. Because you're constantly deploying, right? Deployment communications. Future state job roles. There it is again. It's that important. It happens a lot throughout agile cycles. So, hopefully you don't have to worry about this too much. But do know that people resist change. Don't be shocked. This is advice to the data scientists. Don't be shocked when not everybody in the organization embraces what you're there to do. Data science modeling. Evaluate. Now, this is slightly more technical than some of the other advisement, but needed. Evaluate various models and algorithms. And here's some of them. I actually had a whole webinar on them. So, I won't go into it in detail here. But don't become one size fits all. Don't become monolithic. There's this high value in applying the right algorithm and the right model to the right work product. And so, tune parameters. Do iterative experimentation. Data preparation. It's okay for you to roll up your sleeves and do the last mile of data preparation. It's not okay. It's not workable long term for you to do the whole football field worth of data. Again, the organization should bring a data architecture that is at some of the levels that I talked about here. Not too many data warehouses. A data lake, for sure. Master data from somewhere. Maybe MDM, hopefully. That data is in the data lake. All the data is leverageable. It's of sufficient quality. It's at a high granular detail, et cetera. Those are the things that you can do as an organization to help. You may discover additional data needs or data quality issues. Yeah. You might very well have input to data architecture. And hopefully the data architecture team or individual is there to pick up the ball and take it forward. You may feel or be alone in the organization. And I've given you some tips about that earlier. So, here's some advice, some other advice. Allow ample time for skill building. Stay close to business strategy. Don't divorce yourself from the business goals. Eventually, you will set business strategy. That's the mindset I want the data scientists to have. I want the data scientists to feel comfortable with that. I want the organization to go that way. A lot of organizations talk about being data driven. This is data driven. This is what it means. And this is great for an organization. I love to see this or see this in process. That data setting business strategy. Look to digital leaders. For me, digital leaders are Amazon, eBay, Apple, Walmart, and many, many others that I just happen to know about in my experience that may or may not be interesting to everybody, but I know they're doing great things for their companies. Hopefully, you have that list in your head as well. Keep up on what those companies are doing. You can learn a lot from our digital leaders, our best practice awards winners, our case studies. You might pair with data scientists across different domains, maybe even across different companies. There may be meet-ups or I should say there are meet-ups for this. There are some right here in Dallas. Okay. Collaborate across your company with data scientists. And don't forget to bring this along. Ethics. Elon Musk said AI is our biggest threat. I don't know if I agree, but whatever, it's a threat of some degree. And you can think about ethics in all of the things that I show here, just sort of a grab bag of where that ethical conversation around AI is coming from. So don't get down the path too far with AI. That's your top job, right? Don't get down the path too far without thinking about the ethics of what you're doing. We may not be setting in place algorithms that fire off weapons. Some AI is doing that. We do have to watch for bias in our data. If we generate training data, that has to be bias-free and really representative of the enterprise. That's difficult. We have to have transparency. Can't just say, well, the algorithm did it. We have to know more than that, especially today in this era of compliance that we seem to be entering into. We have to watch out for fake news and, you know, assuming everything that we read, every piece of data that we get, especially if we're out there, you know, scanning the web or screen scraping the web or whatever, bringing in kind of outside information that way. You have to watch out for that. Surveillance, birth, trade, AI rights, you know, does AI have rights? You're probably not getting into that too much, but it may be the output or the downstream implication of what you're doing. Be a leader. Be a leader. Shoot for this. It's not going to happen on day one, but I like to have goals that I'm aspiring to, that I'm working towards. And this is my eyesore for the day. But anyway, goals in analytics strategy, analytics architecture, analytics modeling, processes, and ethics. Shoot for this. What I show you here, to have multiple data scientists to justify the job, to have new team members brought up to speed in weeks because things are organized, not quarters. Have analytics contributed to all major projects, not just some, all data scientists. You can substitute data science here. Have a central catalog, tracking all models. That's the beginning of creating a model sharing organization. So I'm not going to read all this, but the point is that you should have goals for yourself across these five categories and put a date on it and go beyond that. Grab the slides if you want all of this, right? Okay? So go beyond that. This is another level yet. This is a highly mature organization here in terms of artificial intelligence, analytics, and data science. It makes the business different than what it was just two years ago. It's driving company initiatives. Everybody on this may be the same team. Full code reviews can be deployed anywhere. Again, I'm not going to read it all, but there's a lot involved in here that you want to aspire to. And if you don't like this, that's fine. Create your own. But have something that you're aspiring to that's the bigger picture of what you do, not just the ROI day to day. I didn't say the job was easy, did I? The job of the data scientist. I didn't say it was easy. And I also, hopefully, I made the big point that you're not going to sit in isolation in the organization. Organizations aren't ready that far. But I hope I'm also giving advice and maybe some discomfort to organizations out there that are thinking they're going to hire a data scientist tomorrow and they're going to be productive. Go ahead and hire the data scientist, but you got to show the data scientist that you're on the path to bringing assets to bear in what they're doing. And don't forget about, you know, it's not just ML, it's ML ops. ML ops draws on dev ops principles and practices. It gives you continuous integration and delivery, collaborative development, a business value focus and governance by design. These are some of the benefits of ML ops. So it's one thing to get ML in production. And I have a, you know, round of applause for that, of course, but to operationalize this, to systemize it and to make it happen in, and I have clients that have millions of models if I added them up. Millions. You can't get there. You cannot get there without ML ops. Focuses, all right? Focus on templates and automation. I've made a big point about that. Monitoring your models, not just training them on the way they go. All models will need to be revisited. Managing the deployment, doing quality assurance and SDLC, putting that around it. And alerting humans, all right? This is not all about machines. Alerting humans so that they can bring their judgment into place in areas that your AI models find within the organization. All right. I'm coming up on the top of the hour and the end of my slides here. Hang in there just a minute. These are some of the challenges that you will face. Hopefully, I've made a big point about that already, but these are still very early days and things are being ironed out. Okay, for sure. If you're a day of scientist today, there's still a lot being ironed out that you're doing, right? Many day of science initiatives work in isolation from each other and the broader business. Day of science can require massive, and that's a problem, by the way. I mean to say. Day of science can require massive volumes of data which needs to be accessed scalably. That's the data architecture I was talking about. Somebody needs to be growing it for day of science. It is difficult to measure and manage the value of day of science projects, but you must. You must have a framework for it. And senior management sometimes at least does not yet see the day of science as strategic. And this goes back to my open arms discussion. All right. And by the way, let's not get too full of ourselves. And Google's new AI designs AI better than humans could. So let's do a great job before Google starts just running every organization out there. So this has been what the aspiring or new data scientist needs to know about the enterprise. And with two minutes remaining, if there are any questions, Shannon, I do have them. There are indeed. So yes, so diving in here. And just a reminder, I will send a follow up email to all registrants by end of a day Monday for this webinar with links to the slides and the recording. It's to what extent does data scientists need to be a firefighter and fix software and business crises? What percentage of crises an organization is ready for a data scientist? Okay. Great question. I would say just to some degree, a data scientist needs to be a firefighter because some organizations, that's all that's valued is your ability to put out fires. And we certainly want to be valued, but higher value data scientists are not going to be involved in day to day firefighting. They may get involved in crisis because that's, you know, that's, hopefully that's not a daily occurrence, but, you know, that crisis by definition means that it's business critical and data can often be brought to those situations. So that's hopefully not their day to day. But yes, they need to be somewhat trained in it. How many organizations are ready by my definition for data science? You know, fully ready, very few, fully ready. I would say you're looking at 20, 25 percent, but ready enough to where they can hire a data scientist and make them productive. That's probably a good 50 percent as long as they're willing to stay on the journey and keep getting more ready. We do have a couple more questions that came in. If I can shoot them over to you, William, because I'm afraid we are at the top of the hour, but thank you for this great presentation and thanks for our attendees for being so engaged in everything. Just a reminder, if you want to engage with William or with each other afterwards, you can go to community.dativersity.net and maybe we can post those questions in the community as well. If we can shoot those answers, I'll get those questions in there for you, William. I hope you all have a great day. Again, just a reminder, I'll send a follow-up email by end of day Monday with links to the slides and the recording. Thanks, everybody. Thanks, William. Thank you.