 Hello and welcome, my name is Shannon Kemp and I'm the Chief Digital Manager of Data Diversity. We'd like to thank you for joining today's webinar. Today, Adrienne will discuss the disappearing data scientists. Just a couple of points to get us started. Due to the large number of people that attend these sessions, you will be muted during the webinar. For questions, we will be collecting them by the Q&A in the bottom right hand corner of your screen. Or if you'd like to tweet, we encourage you to share highlights or questions via Twitter using hashtags, smart data. If you'd like to chat with us and with each other, we certainly encourage you to do so. Just click the chat icon in the top right hand corner for that feature. And as always, we will send a follow-up email within two business days containing links to the slides, the recording of this session, and additional information requested throughout. Now, let me introduce to you our speaker for today, Adrienne Bowles. Adrienne is an industry analyst and recovering academic providing research and advisory services for buyers, sellers, and investors in emerging technology markets. His coverage areas include cognitive computing, big data analytics, the Internet of Things, and cloud computing. Adrienne co-authored Cognitive Computing and Big Data Analytics, published by Wiley in 2015. And it's currently writing a book on the business and societal impact of these emerging technologies. Adrienne earned his BA in psychology and MS in computer science from SUNY Binghamton and his PhD in computer science from Northwestern University. And with that, I look at the floor to Adrienne to get today's webinar started. Hello and welcome. Great. Thank you very much, Shannon, and welcome to everyone. So, the Disappearing Data Scientist. Yes, it's a provocative title, and that was the intent. Like many of you, I've been reading almost constantly about how important data scientists are and how few there are and how much money they're making and all that exciting stuff. So I thought I would talk about how long we're going to need them and what the future looks like. So let's dive right in. You said hoping to get to the next slide. There we go. From The Guardian, one of my favorite newspapers, used to be The Manchester Guardian, in their career choices section, they had an article called, What's a Data Scientist and How Do I Become One? This is really sort of typical of the things that have been in the press for a while. We'll talk a little bit about what the roles are and where it goes, but what I wanted to point out here is in their headline, says there's a shortage of data scientists, companies looking for programmers and analytical thinkers to plug the gap. And my favorite was, the next three years offer unveritable gold mines for data scientists. Before you get too excited, the fine print at the bottom will tell you that that was published just over three years ago. So what comes next? What we're going to talk about today is what is a data scientist? I assume most of you have some working definition. But I want to focus on the things that I think are important in that role, and then talk about how the role is going to change based on technology, why that's inevitable, why it's already happening, and what the role of the tools will be going forward. Then talk a little bit about the tools that are out there to augment or automate data science. Let's get right into it. When I think of data science, I think of a discipline and a person who is a data scientist, is someone who can identify and interpret business needs. Basically it means it's a person that can talk to business users that have requirements that understand that there is some data or that if they had the right data, they could make some decisions. Generally it's all about making decisions and discoveries. This is a lot of D words here today. But a data scientist is someone who can not only formulate a problem statement based on a business requirement, but identify the appropriate data, understand what that data is in terms of form, and prepare it for analysis, whether that's cleansing, whether that's aggregating, whether that's sampling. It's a set of tools and techniques that they have to enable to them. And then analyze the data. And for analysis, if we're talking about some of the professional data scientists, they need to have at their disposal a variety of tools. And one of the hallmarks of the professional is they know which tools to use in which circumstance. You may be able to get away with using the wrong tool. We'll talk about that in a minute. In some areas, the use of the wrong tool is going to be pretty obvious pretty quickly. But when you're dealing with numbers and analysis and largely statistics, probability and statistics, it's often the case that the results look perfectly valid, but they're meaningless. And so the important part of being a data scientist, or to me one of the most important parts, is that someone that can find the right tools, find the right approach to the data, analyze it in a way that makes sense, and then interpret that. And the last one, tell the story. Data scientists really is a business storyteller who's telling a narrative based on quantitative data. So in terms of education or skills, you need a fair amount of probability and statistics. It's about mathematical techniques. I think it's important to have some background in experimental design, because a lot of times the types of tasks that are being handed off to data scientists are research tasks and you want to know if one thing is better than another or if one thing implies something else, or if there's correlation or there's causation. So it's setting up that process. So it's being able to understand the data, understand the appropriate process, and then communicate the results in a way that makes sense and enable a business user to make a decision. So let's look at a couple of the different extremes. If you've been on any of the other webinars where I do a lot with machine learning, you'll see that there's an overlap here. So if you look on the left and think of these two circles being the center of the clock and the red definition is the starting point. So on the left side, we're going to start with a problem definition. Business user has a problem, communicates it to the data scientist. The scientist goes through the data discovery, preparation, modeling, analysis, interpretation, and then preparing the results. That's probably the one we think of most where there is some problem and we want to solve it. And a data scientist can be thought of as a professional that comes in to interpret the need. The other side is a little different, similar in some ways, but it's going to use a different set of tools. In that case, we're going to start with data discovery through the preparation, the modeling, et cetera, looking for patterns that we didn't know existed. So on the first side, I want to know what happened in the Southeast region in terms of sales and can I use that data plus some weather data to predict what's going to happen in the Northwest next year. I know what the problem is. On the right side, we're looking for discovery. So it's kind of like the difference between, I like to think of it as difference between science and engineering. On the left side, we know what we're looking for. On the right side, we're looking for something new. We're looking for patterns. And the reason we would employ the services of a data scientist in this area is that generally we're talking about a lot of data. And so it's not necessarily a simple thing. We're going to plug in data from your stores into a spreadsheet and do a quick model, a couple of macros. What you need to be able to do is to integrate data from a variety of sources and figure out what's important, what's contributing to your results. So left side is here's the problem. Tell me what the answer is. Right side, here's the data. Tell me what you think. And in terms of the relationship between that and machine learning, what we're talking about on the left side is basically you think of the supervised learning where we have some sample data, we know the relationships between the data, and we're looking to classify things. On the right side, we have a lot of data. We don't know what's in there. We don't know what the relationship is or what relationships are between different segments. So that's a couple of different skills that both use the underlying education and skill set that just described for data scientists. What you do now is you spend a couple of minutes talking about what I think of as the natural order of things because that'll set us up for where I think things are going and why some aspects of the data science role are going to disappear. And my basic premise is that for many technology-driven roles, the natural order is that we will go from very heavy reliance on humans to humans augmented by technology to humans possibly being replaced by technology. I'm going to give two examples, versus obviously telephone operators here. If you looked at the rate of adoption of telephones worldwide, you would be able to track back to a time when the ratio of telephones to operators, sorry, the ratio of operators to telephones was pretty high. We needed a lot of humans in the loop to make connections between people on the phone. The rate of adoption of telephones was growing fast enough that if you were to just plot that out before we talked about data science, a simple plot that said, okay, the population is growing at x-rate, the adoption is growing at y-rate. There would come a time in that graph where every human being would need to be a telephone operator, and after that you would have a real problem. Well, of course, if we think of a telephone operator as a profession, which it was for decades, that didn't happen. But if we think about it as a role or as a task, then that did happen, because basically we are all telephone operators. Every time you make a call, you do more work today to make a call than you did in the days when you could just pick up the phone and ask an operator to do it for you. The technology enabled us to replace the human element, for better or for worse, with a distribution of the task to the people who would benefit from it, namely the people who were calling or the people who were receiving phone calls. So the scope of the problem in this case was huge. It was going to be without that massive change in the technology, an intractable problem. The technology was, by today's standard, relatively simple, but the impact was high. And the key to this is just that we changed the distribution of work, if you will, and substituted technology for human labor. And one way of looking at it is de-skilled the job. That de-skilled word is very important. Assuming that most of the people on the webinar today have not been telephone operators, that hasn't been a big employment opportunity for decades. But in my next example, this is very relevant to those who are using technology today. It's programmers. If you were to look at the field of programming in the 50s, 60s, 70s and see what happened in terms of the requirement, the demand for programmers, you would see that the cost of programmers was high. The demand was outstripping the supply. But around the 60s and 70s, we started to see some major changes in the way programming was done. What I'm talking about here is things like going to higher level languages and going to processes like structured programming. And the reason I bring this up is because this, unlike the telephone operator, programmers haven't gone away. Certainly there's a high demand for them today. But if we were still coding everything in assembly language, not very much abstraction there, we would be already at a point where pretty much everyone would have to be a programmer. And that was avoided, if you will, by changing the tools. So it changed the roles. And now the majority of people that are programming are using higher level tools. They're using higher level techniques from structured to object oriented. Agile, a lot of different things that work together. Sometimes they're complementary. But basically the nature of the job, while it may look similar, you're writing something that ends up in code. The tools that are being used provide a lot more power to the individual programmer. We also see in many cases that tools have been developed to allow end users, business users, if you will, to generate the functionality, whether it's by using tools that generate code or just interpret the user's requirements in such a way that programming isn't required. And I will just give you one point of reference. Back in the 70s, there's a book by a sociologist called Philip Kraft on the relationship between programmers, managers, and work in the U.S. And his premise as a sociologist was basically that structured programming was... I don't know if I'll overstate this, but it was pretty close to structured programming was a plot to de-skill workers. So if you looked at the skill level of the individual that was required to be a programmer, that certainly changed. And what's happened now is over the years, the skill level to write the software has changed. You need a different skill set. You don't need to know as much about the underlying machine because everything is at a higher level of abstraction. But the end result, if we were to measure in terms of productivity, what was being delivered by programmers today is much higher than what was being delivered by the folks that knew more about the underlying structure years ago. And I use those two examples to kind of set up the stage for where I think things are going with data scientists. The issue today, it's well known that trained data scientists are in high demand. And I was going to have a slide with all the crazy ads that I've seen, become a data scientist in eight weeks or six weeks, or we can do it in four weeks or learn Excel and become a data scientist. What's really happening is that in those cases, I think the definition of data science is being watered down. I will say that data science is one of those terms that I wish would go away. Anyone who is a scientist in the natural sciences or the social sciences probably has the required skills to do this analysis using data and the recognition of the relationship between patterns in data and experimental design. So someone who is a data scientist but isn't a scientist in another field is kind of an interesting anomaly in the greater scheme of things. But what's happening is we're reaching the point where there's so much data, there's so much expectation that we can analyze that data that people are trying to, I would say glorify the role of data science, it's a very important role, but trying to distribute that to people and create these tools that are analogous to self-service where you dial the telephone yourself for higher order languages in programming or application generators where some of those skills that we talked about are now being assigned or delegated to machines. So let's take a look at those trends. The natural progression, if you follow from the telephone operator example to the probably more of a direct comparison to programming is that to begin with, we would have a business user, IT, typically in a large enterprise will maintain control over the data, which makes sense for things like security and access and being able to recover from disasters, et cetera. And the business users have to interface with IT but put in a request. You may have a business analyst in there, who's somewhere between the two, maybe somebody that is in business but speaks IT or someone that's in IT but speaks business. Moving to the popular model today, which is have a data scientist who has access to the data. It may still be owned, if you will, by IT, although sometimes we're having localized repositories of data. But the data scientist is the interface with the business user. Where it's going, and this is the model that I hope to show you in the next 10 or 15 minutes why this is almost inevitable, is that the data will be the central focus. IT may still control it, but the interface is that the business user will have control over the management, the update, the analysis, and it won't require the direct intervention in general. There's all those exceptions of IT or a data scientist. And how is that going to happen? Well, one of the things that I always try to make sure we think very explicit is that in any field, there's probably a selection of tools that you'll have to use across your career. And this happens to be a subset of the tools that you would have to use if you were working on something like an old MG. Yes, it's my garage. And it's important to know which tools to use when. It may look like you can substitute one for another, but there's a big difference between using a brass hammer, which won't spark in a gas environment, and a rubber mallet. So you need to know, or the role needs to know whether that role is being performed by human labor or digital labor, or the digital workforce. You need to know which tools to use at which time and how to use them. So I used the phrase, it's the tools plus the knowledge. It's not, you can't have one without the other. I refer back to Tom DeMarco's quote that a fool with a tool is still a fool. You need to have that explicit knowledge to be able to select the right tool at the right time. In data science, the tools that we're looking at are things like all the different algorithms, the different processes, the different technical tools. I'm just using the machine learning studio from Microsoft Azure as an example because they have a nice representation here that lays out a lot of different tools and what they're for, clustering tools, recommendation tools, regression tools, et cetera, and what types of data. It's a nice visual that if you are a data scientist, or an aspiring data scientist, this should make a lot of sense to you. You know when you're trying to solve different technical problems which one of these you're going to use, a lucid decision tree or a decision forest or a future hashing in text analytics, that's not necessarily almost never intuitive to a business user. And so the interface between the person with the business analysis problem and this set of tools is today typically the data scientist. What we want to look at is what is going to enable us, what is starting to enable us already to change the tools that are available to diminish the need for those specific skills. The forces that are driving self-service data, sorry, self-service data science are pretty simple. The issues are that the growth in data means undisputably very high. I don't need to quantify it. Everybody would understand that between deep structured data and surface structured data, something called unstructured versus structured data, natural language text, et cetera, more is being produced every day. And there's an expectation that since the statistical tools and the analysis tools or algorithms anyway are available to solve problems that any business that has the data should be able to use it. So it's the data growth combining with the lack of skills in most enterprises that are creating this demand. And what works along with that in terms of creating the opportunity for this issues plus demand is that more and more in business the spend that we would traditionally think of going to IT, anything solving informational or data problems, is being controlled by the business units rather than being allocated to IT and then controlled and parceled back to business, if you will. And on the supply side, what is creating the opportunity, so we've got the demand, so we've got the opportunity, is that a number of artificial intelligence technologies are maturing to the point where they can augment the business analysis requirements and in some cases automate them. In some cases we can see ahead that it's going to be automated but right now it's going to be augmented. So we want to look at that process. So to do that we need to understand of the major tasks for a data scientist which of these can be automated. Identifying and interpreting the business needs, identifying and preparing data, analysis and interpretation. And with interpretation I would include storytelling. I should probably make that explicit. And before we go on to the next few slides, I want to just have you think about this one. So identifying and interpreting the business need. Typically if we're dealing with a business user, let's say a marketing manager who's talking to a data scientist, they may have a specific requirement. I need to be able to do this kind of a forecast. I need to be able to do this kind of a root cause analysis and have the data scientist put that into a plan, create a model. This is what we're going to do. And what's important to understand is that the way that process typically works is it starts with a conversation. So the business user can communicate in natural language. Maybe they're having a conversation. Maybe they're writing it out. Maybe they have some form especially they have to create. But the input to the data scientist is natural language. And in natural language, the skill that the data scientist has to have at that point is to be able to understand those requirements, which in NLP terms, that's natural language understanding. You need to be able to classify what that problem is. Is it something that you've seen before? Is it something new? It's a classification issue here based on the natural language understanding. And then match that to the data. So the first step is in natural language understanding. Okay. A lot of technology has been developed to output that. We've talked about it in a number of the other webinars. But just as an example, there are dozens of systems out there today ranging from what I think as AI-based chatbots to more comprehensive natural language understanding. All the way up to something as sophisticated as the IBM Debater system that was unveiled about a month ago to classify and understand arguments and understand nuances in those requirements. So that is something that today, a lot of progress is being made. It certainly could have an accurate dimension for understanding business requirements in natural language. You would see a spike in that over the last four or five years. So that's promising. Identifying and preparing the data. If you have already taken that input and put it in your form, that's a knowledge graph, something like that, where you can represent the request, the problem statement in a consistent, unambiguous form and understand the concepts of what you're looking for, then mapping that to the data to see if the data that can answer that question is available. That's something that is, I certainly don't want to say trivial, but it is a problem that is becoming increasingly solvable with off-the-shelf technology. And by that I mean that we can model the data that we have and we can model the requirement and do a mapping to see what needs to be done. And the analysis, that's where the skills of the data scientist are largely initially a case of identifying what type of analysis. If you go back to it a couple of slides ago, showing all the different tools, and that is, well, that's a function of both the data itself and the problem you're trying to solve. That's something that we can do in a semi-automated fashion now. I'll give some examples in a minute. Then the last part, interpretation and storytelling. This is where a good human data scientist can not only show you the numbers, but tell you what they mean and that telling you what they mean is going beyond presenting a spreadsheet. These are the answers that you asked for to telling the story. What does it mean? How did we get these results? What is the significance? And I will show you in the last couple of slides that there's some really interesting advances in natural language generation that are already being used in business intelligence tools or being adopted right now that will get us to that point. So excuse me. So my simple graph here is that self-service data science which means that the role of the data scientist as an individual is diminishing and it's being distributed to people with the problem is being enabled by tools that combine business intelligence functionality and artificial intelligence to drive the selection of functionality within the BI tools. I'll put this in a simple chart. Properties of the data plus the type of problem that we're trying to solve plus the use of machine learning is what gets us to this modern generation of automation and pulling the human out of the loop. And I hope it's obvious that I'm simplifying this somewhat, but this is a trend and we're already seeing a lot of progress in here. So when I say the properties of the data, there's an old saying where we saw an architect from Chicago wrote the form follows function and the reason I bring that up is once you've described the problem, in some cases you've almost by default defined the possible solutions the way to approach it. So once you have that statement of what it is you're looking for and you know what data is available what's in the universe of data, you have the properties of the data, you know what the analysis is, that's going to drive you to a decision. Now, I should point out that in that little error, say the analysis or user interrogation, what I'm getting at is that a data scientist after getting the problem specification may end up looking at, well, how are there other solutions of similar problems been solved? I follow from that. Or if I need more information, that's the user interrogation. And one of the important things that a good data scientist will be able to do is to actually probe for more information. So if you've been looking at systems like, I've been watching in healthcare where the system has been developed to recognize when it has a set of symptoms that if it had certain other information it could increase the confidence of a diagnosis. The system will actually ask. It's like a doctor is talking to you and says, okay, I think it's one of these three things. If I give you this test, I can eliminate two of them or I can improve the confidence. The interrogation, it's that interaction. It's that conversation with the user that's currently largely the job of the data scientist. And then there's machine learning that will improve the process going forward. So the trend is in going from structures and queries, which may be SQL, that is very data-centric. The way we interact with the data is to have, frankly, the people that are generating the queries think like the machine to more natural interaction for this interrogation. So being able to use visual tools and perhaps not as simple as the pen and ink tool here, but the visual tool is to be able to interrogate which databases you want to include and move this around on our chart to decide what path you want to take to a natural language interface where a user instead of speaking to a data scientist can describe the problem in scripted or unscripted natural language. So the trend is going from having the user have to think like the machine to having the machine using air quotes here, think like the user. It's having the machine being able to classify the natural language input into something that enables it to perform the analysis. That was, I'm sorry, let me go back. So that was from our problem definition, that first step. We get into data exploration. It's very similar. We go from interactive to conversational. And what I mean by that is interactive, stimulus response, input query, and demand for more information to something that is more natural language-like. And again, it's putting more of the power or more of the interpretation, if you will. That role is going to the machine using artificial intelligence, natural language understanding technologies, largely powered by machine learning to enable a conversation between a business user and a business intelligence system. So I like to think of this as being more of a distributed analysis and the simple analogy is that the person that has the problem by eliminating the intermediate step, if the tools become powerful enough that the tools can interpret the user based on natural language description and not have the business user have to understand all the different options, if you will. That is going to reduce the demand for data scientists that we have today. Right now, in the marketplace, there are lots of tools out there that do general business analysis. Excuse me. So what I have here is a representative set or list. And these are vendors that have products in the market today that are beginning to meet the self-service data science demand. The reason I included these... Each one has one or more of these technologies that are being automated. They're in alphabetical order. Don't read anything into the order other than alphabetical. What's interesting to me is that all the traditional vendors that have had analysis tools that haven't required or think of this data science or have some of the more sophisticated levels of statistical analysis are now beginning to offer the more sophisticated analysis and making it easier for the users with less sophistication, if you will, to use them. And so it's all across the board when I said that you need to look at which aspects of the data science process can be automated. My belief at this point, and I think it's being demonstrated in some of these companies, is that virtually all of it can be automated, which doesn't mean that that's the best solution. I get to that as we come to the close of the slides. The last part of it, which I think is still a huge part of the data scientist's job, which is storytelling, is now being automated. And this is a trend... Really, I would say that what we see today, the seeds for this were sown two or four years ago. But this is a map showing a number of those products and vendors that have the automated BI tools that are becoming data science tools and how they map to two of the leading vendors who offer natural language generation tools, narrative science out of Chicago and automated insights out of North Carolina. You'll see that for the major players, Microsoft, MicroStrategy, and Tableau are offering integration with both narrative science and automated insights tools. And the reason this is important is both of those companies, narrative science and automated insights, are really known for creating products that allow you to generate natural language narrative from data. So if your data comes in, it's a spreadsheet or a database, it's something that has been ordered into a surface structure form. The tools, narrative science, Quill, automated insights with Wardsmith, some of the other tools that are out there, can, by understanding what the position is in that table, if it's in a tabular form or in graph, it can generate a narrative. And these have been used quite successfully in things like recording, writing stories based on box scores from sports. If you were to look at the box score from the baseball game, the tools here could generate pretty much a play-by-play on the numbers in the actual box score. You can adjust your tone, things like that. And tone based on context. Well, in the case of data science, what's happening is the results of that analysis. Assuming you got the right analysis, we're going to work from back to front now. Assuming you got the right analysis, having a tool like this enables more than just creating a visualization of the data more than a pie chart, something like that. So it augments the tools that have been improving over the last decade for visualization. Say, okay, well, now we have the data, we can visualize it and we can hear about it, if you will, as a narrative. So I think that's a pretty important step forward. But all that presupposes the data that it's working from is accurate. If you're generating sports stories and these tools, some of these tools were used for things like recent Olympics reporting. There are sports that perhaps don't warrant the assignment of a high-paid reporter or even a low-paid reporter. So it's going to be generated very quickly from the data that comes in. That data is presumed to be accurate. If you're doing analysis of the business's future, it depends on it. It doesn't matter that you have a great visualization or a great narrative if the data itself was wrong. So that's going to bring as we start to wind up here to get into some two-minute Q&A. The... I'll just quickly go over the findings part and really focus on the recommendations. The demand for these tools is going to continue. Unabated probably increase over the next five years. On my first slides, the Guardian said it's going to be fantastic for three years, but that was three years ago. It hasn't gone down. That projection or projection was based on some research done by McKinsey in terms of the number of data science jobs available and their projection for the number that we're going to be available over the next several years. So the demand is going to continue to drive self-service analytics, which is what the replacement is. But the quality of automation versus augmentation. Automation is when the person is going to be taken out of the loop versus augmentation where the person stays in the loop but is able to do more. That varies widely right now. And that's why in that chart I didn't rank those companies. The products themselves are changing so rapidly that I didn't want to put that out there. I will say that if people are interested in specific tools or trends and want to follow up with me, I'm happy to do that if you get my contact information in the next slide. But basically the biggest benefits that we're getting right now are coming from advances in artificial intelligence. Tools and algorithms develop for classification problems, which is really the ultimate sweet spot for deep learning in machine learning and natural language processing technologies. What I showed in the slide just previously is that everybody is now or many of the companies now are moving towards creating a story telling BI tools, if you will. And as I said, that's a huge part of being a good data scientist that the maturity of natural language generation tools is really remarkable over the last five years, I would say. And if you look at how they can build a compelling story based just on the data and the tuning of the algorithms for the generation, I think that part is just not where it needs to be. It's the beginning part that is still somewhat problematic. It's the natural language understanding that allows you to decide which algorithms to apply. A lot of the tools out there today where they save a lot of time are being able to recommend which specific algorithm or which two or three algorithms, which way to analyze. But where the issue is, they can present you with a number of different alternatives, present you with different quantitative answers. And if you were to add in the natural language generation tools, they could tell you different stories based on a different interpretation. But if they don't have the right, if they haven't understood and represented the problem in the optimal way, it's a situation where you're going to get a bad answer faster and you may not know it. Now, I would say that the overall state of the industry is very encouraging. If I were in a situation where my option was to wait six months to a year to be able to hire a data scientist or train some people, I'm not signalling out any particular institute that does four, six, eight-week training that is being deficient. But if my business depended on it, I would be cautious right now. I would employ these tools as an alternative to waiting. But I would want to see them today even if the tool is talented or something that can automate the task, I would treat them as something that augments the role. So the recommendations here, kind of pulling it all together, is I know I used a citizen data scientist in the description of the webinar. It's becoming a term of art and I guess I'm getting to be a curmudgeon. There are so many terms that I really don't like at the moment. But the idea is a citizen data scientist is like a citizen reporter or journalist, whatever. Someone that isn't formally trained in a discipline but is able to carry out some of the tasks based on being enabled by tools. Don't think in terms of making everybody a data scientist, that's not going to happen. Think about the productivity of the individuals, what their role actually is and how some of these tools can augment their role. But what I would strongly recommend is that in sufficiently large organizations to allow the business users, not that you could stop them anyway with their own budget, but perhaps to encourage them to use some of these tools to generate different models and alternatives. But to still have people trained as data scientists they would just be able to serve and support more people because they're going to be reviewing input that has been pre-processed, if you will. So that will make the business user more productive and it will make the data scientist more productive. And I would say that since most organizations, by this point, unless it's a brand new organization, you've probably already got a business intelligence tool or a number of them in-house. Start by looking at their roadmap for automating these various tasks to see whether they're sufficiently employing. I mean, it's a lot easier to say we have natural language understanding of model generation than it is to do it right. And if you have questions on that, I'd be happy to talk to people offline. But number three, train the users on the analysis fundamentals and experimental design. I think that I probably should have put that as the first one. The tools they're designed to be easy, certainly they're a little easy, but if you don't understand what you're doing, at least at a surface level, in terms of experimental design, probability, and statistics, you're not going to articulate your problem in a way that will get the right answer. So you can't get around, you can't democratize data science by throwing the tools that are a problem when the problem is understanding. These tools are becoming more intelligent. Again, air quotes around intelligence. All the time they're getting better at understanding the intent and the context and being able to represent that and use that to guide their algorithm selection. But they're not there yet, and it's not their problem. It's the problem of the business user to make sure that they're getting a meaningful result, not just a repeatable result. So I think it's important to, if you're looking at new tools, test the interfaces. Some classes or some individuals are going to have different reactions to narrative versus visualization and the progression from interactive to really conversational systems that should be very simple for users to make that leap. But I would strongly encourage an opinion of testing before we make any commitment to the new interfaces. And finally, although I said it's a disappearing data scientist, I think it's pretty compelling that the demand for fully trained data scientists with the level of sophistication that you see in job requirements today is going to soften. It's certainly not going to be the case that the job itself is going to go away in the next few years. So I have some ranging from 20 to 23. People ask, oh, should I have my child become a data scientist? Yeah, it's not a bad thing. But personally, I think that the better course of study if you're on the younger side is to focus on a specific science that you're interested in and you learn enough about data science to be able to apply it. But if you're at a point in your career where you're looking to improve your credentials and open up some opportunities, a combination of a foundation of business skills, plus even a few weeks of data science instruction will probably make you more valuable in your organization. And with that, I'm going to turn it back to Shannon. Adrienne, thank you so much for this great presentation. And we've got questions coming in already. If you have questions, feel free to submit them in the bottom right hand corner in the Q&A section. And just to answer the most commonly asked questions, just a reminder, I will send a follow-up email by end of day Monday for this webinar with links to the slides and links to the recording of the session as well as anything else requested throughout. So diving right in here, Adrienne. So I find it a scary concept to automate analytics to the point where business users are performing their own analysis without the knowledge of an operational research analyst. How would self-service data science have the same level of quality? Thank you for the question. I think the more you know about analytics, the more frightened you may be. So I empathize with you on that. In terms of the quality, the real area that I worry about, and that's why, you know, in the last few minutes, I tried to work from end result back to the front. Once the data, if we've done the right analysis, the reporting, the interpretation of the results, that is very well under control right now. The quality right now is determined by the quality of the model, if you will, which is basically representation of the understanding of the data scientists. And right now, I hope I didn't give the impression that you could pull out the best of the data scientists out there and replace it with any level of the tool. That is not the case. I do think that the quality of analysis performed by people who are titled or self-titled data scientist today varies. And if I were to try and do a quick graph, I would say that the folks that are probably the weakest on the human data science scale are probably the ones that could be most effectively mentioned or replaced by automation. The real issue here to me, I don't want to diminish the... I think the person's word was scary. I don't want to diminish that. I have for a long time felt that the choosing the right tools and winning tools, I include algorithms, is something that, typically for a good data scientist, that's an iterative process. That's a conversation. And that is more the human side than the quantitative side. It would be my quick take on that. And so the thing that's going to get us to the stage where we are less frightened of this and more confident in it is the exact same thing that gets us to that stage with things like medical diagnostics. It's that the representation and the interrogation, if you will, of the user is sufficiently rigorous. And the feedback given by the system to the user that says, this is what I think you're asking and this is why, as that gets refined, that gives me more confidence. I often go back to a memorable scene at the camp as a kid when a counselor said, which is more dangerous, a sharp knife or a dull knife. And my immediate thought was, well, it's a sharp knife if it's used improperly, because you can hurt yourself, but it's a dull knife if it's used properly because it's not going to react the way it should. And I think it's the same thing here with the tools. If you can articulate what the problem is that you're going to solve, that you need solved in a way that the system, this digital data scientist, if you will, instead of a human data scientist, the more you can remove that ambiguity by successive refinement, the more confidence you'll have it. So, Adrienne, how do you see traditional BIA solving the differentiating and differentiating itself from the various practices of data science, which in many ways encroach on BI territory, or is there even a need to do so? Personally, I think the distinctions are going to become less and less important to the point where, you know, what we think of as BI is, you know, it's part of the toolkit and I don't think that any vendor that has been in the business for a while, as a BI vendor, is going to be able to resist the temptation of branding themselves in the data science world, if not actually moving into that. It's that automation of helping the user of the tool require less sophistication about the underlying technology. It's like, you know, I mentioned the telephone operator. It's like an automobile. I, most of the people I know when you get in the car and turn the key, have no idea what's going on past that point. And that's okay. But that's okay because the need to know has been abstracted out, if you will. And that's where I see the future here in terms of these tools. So the BI vendors that have been around for ages, you know, I spent a couple of days with folks at MicroStrategy recently and they're on, I forget which version, they're basic tools. But there's a company that's been around for a very long time, incrementally improving one of their differentiators was early to the mobile market, but they're one of the companies that's using natural language generation. They're experimenting with AI in a variety of areas. I don't think anybody has the answer today where I would say, oh yeah, I'm a Fortune 50 company. I don't need any data scientists. I can give somebody on the loading dock access to this data scientist in a box application and we'll get the right answer. We're not there yet. The point that I wanted to leave everybody with today is that we are getting to the point where some reasonable percentage of the problems are currently being solved by data scientists. Could be solved by the business users directly with better tools and the technologies to build those tools are maturing fast enough that I see that happening in the next few years. Well, that brings us right to the top of the hour. Adrienne, thank you so much for this great presentation. Thanks to our attendees for being so engaged in everything we do. I love all the chat that's been going on and all the questions coming in. Again, just a reminder, I will send a follow-up email by end of day Monday with links to the slides and links to the recording. Adrienne, thank you and I hope everyone has a great day. Thanks. Take care. Thank you. Bye.