 We're going to start with a few short presentations from the three people I'm about to introduce, looking at the profession and value of data science. We'll show you a day in the life of a data scientist and hope to give you an understanding of why this profession is the next great profession for certification by the open group. So I'm going to introduce all three speakers at once to keep the flow going. But in order of their presentation, Martin Fleming is IBM's chief analytics officer and chief economist. The chief analytics office is a data science centre of competency, focused on improving business performance and achieving financial goals. As chief analytics officer, Martin leads IBM's initiative to become a premier cognitive enterprise. Next up, we'll hear from George Stark, who's an IBM distinguished engineer, statistics and quality. George applies statistical techniques, simulation, modelling and standards assessment to achieve improvements in productivity, cycle time and quality for software development and IT infrastructure operations. And third, last but by no means least, Maureen Norton, distinguished market intelligence professional in the chief analytics office at IBM. Maureen is also currently the IBM global data scientist profession lead, helping to grow the skills and expertise of data scientists. So you'll hear from them one after the other. Please send your questions in along the way in real time. And then we'll get to those questions with a panel session after the third speaker. So first up, please a warm welcome for Martin Fleming. It turns out to be not very often that new occupations on new professions are created. Let me draw on my role as chief economist for a minute and say that as we look at occupations, the growth of occupations over the years, there are relatively few that appear and relatively few that disappear. The joke among economists is the only occupation that's ever disappeared is elevator operators. It turns out not to be quite true because if you're ever in any of the House and Senate office buildings in Washington, you'll notice that they still have elevator operators, which is probably more a commentary on members of the House and the Senate than anything else. But nonetheless, this is an occasion to recognize and celebrate the creation of a new role, a new occupation, and a new certification category. So why is this important? Why do we want to take a moment to recognize this? First of all, the data, the analytics tools, and the technical infrastructure that we have available to us currently allows us to not only measure, but also to be able to anticipate, events and performance of our organizations in a much deeper, more meaningful and systematic way. It's not much of an exaggeration to say that historically it's been the case that much of the performance analytics that occurs is really focused in the financial sphere, largely drawing on the information and the data contained typically in an income statement, looking at ledger-based data. That certainly is helpful, but it only gets us so far. What we've learned today is that there are other sources of data that previously have been either looked at independently, think of CRM data with sales, opportunity of client data, ERP data on supply chain, human resource data, but now we're in a position to be able to integrate those data sets, to be able to take a more holistic view and understand in a deeper way the performance of our organization's good or bad, and be able to come closer to be able to answer the question, why? Why are we seeing the performance that we're seeing? Why are the results the way they are and what are the actions to that can be taken to improve performance? Those causal inference discussions are among the most challenging that we have, but nonetheless, we're playing with a much better hand. Secondly, of course, with the advent of machine learning and artificial intelligence capabilities, the ability to anticipate or predict not only organizational financial performance, but the performance of all of us as individuals, the anticipate the kind of needs that clients have through the use of recommendation engines, and the ability to understand or predict where there are opportunities to partner in an ecosystem, are all examples of places where machine learning and artificial intelligence are helping us to improve the performance of our organizations. There are, of course, very significant workforce implications. Talent and skill is receiving a much greater focus. Certainly, if you talk to any business leader, any CEO, talent is at the top of the agenda. You could argue it always has been, but certainly with much greater intensity these days. Given the kinds of vast volumes of unstructured data, the larger bodies of structured data that we can now integrate, the kinds of capability that the cloud infrastructure provides, whether it's a public cloud or a hybrid structure or a private cloud, all of these capabilities are moving us well beyond where we have been in the past in terms of being able to create business value. That's where the role of the data scientists come in. It's a little bit of an abnormally to have the term our scientist in the title because much of what we're focused on here is creating business value as opposed to simply performing science, but that's the challenge of the role. Be able to bring the business insight and the business acumen with the talent that the scientific training and scientific practices bring to us. Secondly, given the importance of the insight, the measurement, and the value that's being created, the role of ethics and integrity play a significant role. It's incumbent upon all of us who are in these roles to be able to present the insight, the recommendations in an honest and unbiased and fair and impartial way. Much as we have come to rely upon other institutions, whether it be auditing firms or financial teams within organizations, data scientists are now in a position where there's a similar expectation around the integrity and the honesty of the insights and the data that are being created and being presented. Obviously, there are pressures from time to time to present the results in one fashion or another, and that's where the challenge arises. So as we go forward in building this new capability, both as a profession and as a certification program, the issue of ethics becomes an important one. It's not only within our own organizations, but there are larger public policy implications and public implications as well. As you may know, this is a controversial topic in the facial recognition space. We as an organization have been subject to some of the criticism. I'm pleased to say we've been able to respond rather quickly, but the folks at the Media Lab at MIT have been particularly active in making the point around being able to recognize both genders, all races, various ethnic groups, and not having the data driving everybody to be recognized as a white male. Just another example of where ethics is playing a role in the science perspective. And thirdly, because of the significance of the role that we play, security and cyber security become extremely important. The ability to be able to generate meaningful results, which in some instances in a privately held firm can have implications for equity value in the financial markets, security, privacy, confidentiality become very important, and being able to create means by which such insights and such information can be protected is important. Likewise, we all are dealing with vast bodies of human resource data, and in some instances personal information, and that becomes another place where things can go wrong and damage can be done. Of course, as we know, governments are increasingly paying close attention to the issue of privacy and security. A lot of the focus, as we know, has started in the European Union with GDPR, but this is a concern and a movement that's rapidly spreading not only throughout Europe but Japan. The Japanese government has taken a great deal of focus and increasingly we're seeing this issue here in the US as well, although perhaps lagging a little bit some of the other nations, but believe me, it's coming along. So security becomes an important issue. So I just offer these suggestions in the context of the launch of the data science certification program, the data science profession, where we've been pleased to be a part of this, and we appreciate Steve and his team's involvement in all of this, and hopefully we'll get all of you engaged in one fashion or another in this certification. So with that, let me turn the floor over to George. Thank you, Martin. I appreciate that. I'm going to use slides. So Martin talked about the key issues that face us on, I'll call it a model by model basis. These are things that we have to evaluate and have to take into account in everything we do, and what I'm going to tell you about is what my day involves as a data scientist at IBM. So basically this is a view of my calendar from February 15th, and so you can see that I start at six in the morning and basically go to five at night, and throughout the day I'm doing basically the six steps that are the data science process in various forms and techniques. I'm going to spend time early in the morning doing a peer review of a data science model that's been developed by our team in India, and that'll involve me asking the questions that Martin just talked about. What was your training set size? How did you make sure it wasn't biased? What are we doing with security? How did we handle GDPR and did we make sure we encrypted all the possible data sources? How did we understand the data sources? How did we build the model? What does the model look like? What does the model say? How did you validate the model? These are all the things we do during a peer review, and then we ask, okay, if we use this, what do we expect the business results to be? And so the real key here is that data scientist is really a unique combination of business acumen computer science, mass statistic economics, and it's really combining those three areas into a single role that helps the business. So my day consists, as you see, mostly meetings, lots and lots of meetings. We have meetings to talk about what are the data sources, and we have to go interview the experts on each one of those data sources. Martin mentioned the ERP systems, the HR systems, the financial systems, also the operational systems. How many incidents are coming in? How many customer complaints? What's in those customer complaints? We also look at the business problem. So yesterday I had a meeting with one of our clients, and the client spent some time explaining to me his business problem, and I asked some questions, and I said, well, you know, we have these options and these options, and then we went around another time. We iterate on the business problem roughly 20 times until we arrive at a document. And the document is really one sentence that describes the business problem from that particular person's point of view. So what we say is, what's your role in the business? What direction do we need to move your KPIs? What's the key KPI that we're after? Is it we want to reduce the effort in the process? Do we want to increase the quality of the process? Do we want to increase productivity? Do we want to increase profitability? What's the target variable after? What's the scope of the problem? Is it one group? Is it one location? Is it multiple product lines? What are we talking about? And then how much do we expect to see in what period of time? So as we iterate on the discussion of the business problem, we end up boiling it down to this very straightforward document that says, you know, as a CIO, I would like to increase the availability of the mid-range server environment by 5,000 hours over the next six months. Now we have a well-formed problem that we can go out and collect data, analyze it, clean it up, build a model, do some simulations and predict an outcome and say, if you take these actions, we believe we can achieve this business goal that you've defined based on this model. So, you know, the next step is to collect some data. You have to figure out all of the disparate systems that you're getting data from, and you have to make sure that those datasets match. They have to match in timing, they have to match in units, they have to match in locations. One of the interesting things that I see all the time is people pulling data from disparate systems with timestamps, and the timestamps will be in European time, they'll be in US Eastern time, they'll be in US Pacific time, and the data scientist will forget to transform them all to a common timeframe, and the model will be not good. So, time is an important variable in almost everything we do and making sure that it's consistent across your dataset is key. Another one that's really become interesting in just the last, what, four or five years is geospatial data and weather data. We now have incredible access to the geospatial data and the weather data on your phone, on the internet, and in databases that are publicly available, and including that data in your models really ups the viability and the utility of them for making predictions. We did one analysis for a bank in Brazil helping them figure out when to stock their ATM machines based on weather predictions and festival predictions and things we got off the internet, mapping that with their locations of their ATM machines. That was a really cool project and a lot of fun, and it helped the bank save a lot of money because we could predict when the best time to stock their ATM machines were and which ATMs were going to need how much money when. The next step is really understanding the data, and the best way to understand data is to visualize it. And we literally make hundreds of visualizations for every project looking at the different data sources and the different skewness and kurtosis, so how big are their tails and what are all the data sets and where might we find outliers that would drive different things. You have to do this on the individual data items and you have to do this on your model results as well to do these visualizations. You look for different transformations. The tried and true transformations of log and square root are still good. They still do their job of reducing variation in your data set, and you have to use them. Key is you can't ignore zero values, right? If your data set has a lot of zeros, you have to be very careful that blanks and missing values don't get treated as zeros when you're doing your modeling. You have to take all that into account. So the next thing you have to do is actually build a model, and this is where the computer science part comes in. You have to code generally in Python or R. We use a lot of SPSS. SPSS is a very cool product that has been around for 50 years. It does the modeling very well. Basically, there's three or four kinds of models. There's classification models where you take in a lot of unstructured data, classify it into groups, then use those classes in order to build a forecasting model, or you'll segment your data set into similar groups. In my organization, I tend to look at IT environments and try to measure the similarity among IT environments to figure out what kinds of actions to take. These are all the base, but then you also have to answer the what-if. The what-if comes from monocarlo simulation or discrete event simulation that's put on top of these models that you're building so that you can play with the parameters and look for the future. You have to validate the model. Generally, we talk in terms of precision, we recall an F1 score where we're trying to understand what the accuracy and how well our predictions are. In the example that I have on this slide, we're looking at problematic servers in an environment, and we want to know, obviously, how many that we got right and how many we got wrong when we classified them as problematic or non-problematic. Probably the hardest part for most data scientists is building the story. This is where the business acumen really takes hold, and you're able to take someone from start to finish and explain, okay, this was the business problem we agreed on, here's the result, here's the actions we recommend, and if you care, I'll walk you through all the technical details of the model. Most of the time, they don't care. So the rules we have, be brief, be blunt, and be gone. Just give them the results, tell them why the results make sense, and if they want the details, we'll go into it. But you really need to tailor your message for your audience. If you're presenting to architects or data scientists, they're going to want all those details, and you have to be able to explain it to them. Managers are pretty much interested in the bottom line, what do I have to do now, and what do I have to do in the future? And executives are interested in how much money am I going to make, how much am I going to save, what business strategy do I need to make. So depending on your audience, the presentation, the story you build is quite different. So that's the day in the life of a data scientist, and that's going through all of the steps associated with building a model and getting it used in your environment. So now I'm going to hand over to Maureen who's going to talk about the certification. Thank you, George. I wanted to just start with, we've been talking about this world of data, and you have no doubt heard so many times that it's exploding, and how we tap into that world of data is really where data scientists can add so much value. I know NASA is here, so this visual seemed to make sense, but data scientists, they're really pioneers. They're going places with big data and dark data and streaming data and all different kinds of ways to really push the boundaries of our knowledge and to gain the insight and value out of that. One of the things that Martin talked about is the value that data scientists can bring to an organisation, and there's so many different ways that that happens. It can be across the enterprise, whether it's supply chain, finance, not just business enterprises, but nonprofits, and using these kinds of skills to tackle all kinds of societal problems as well. So we're so excited to be able to have this opportunity to work with the open group to recognise this profession, which is really at an early stage in terms of making a significant impact across the board. They can certainly help with a lot of optimization kind of problems that companies and enterprises are dealing with, and most importantly, kind of connecting the dots. When you think about all the data, and George was explaining how you have to clean the data, and connect it, and get the timestamps right, and all of that, but one of the real values that we've been seeing is that the data scientist, their role, is they're going to be connecting data across silos. So you might have people who are working in finance, or HR supply chain, and the data scientist, with the role, if it's done right in the organisation, they're able to really see across, and connect dots across the silos, for a level of insight that can make a significant difference strategically for enterprises. And it also helps the data scientist see things from a very different perspective because of that, and being able to literally step back and be able to analyse things and draw those insights. So, we are today helping the open group with the launch of the data scientist profession, and it is an experience-based certification. So it's not a test that you take, and then, you know, earn something, it's based on experience. So that day in the life of the data scientist, where you have somebody who is a practitioner who has been solving business problems, with these methodologies, is now being recognised by the open group, you know, as our newest profession. We've been working on this for the last year, a little over a year, George Stark and I presented to the open group board a year ago at the January conference, and made a proposal, said, you know, we really think this is a profession that we should add as an open group certified profession. And we built a business case for it, and they were very enthusiastic and gave us the go ahead to start forming a working group to work with other companies and define what does it really mean to be a data scientist. So, you know, we came with our point of view and the collaborative effort with the open group and all the members was phenomenal, because it really improved it to make it be not just an IBM view of what is a data scientist, but really an open group well vetted and argued and debated view of what a data scientist should do in terms of being able to meet a certain standard. So, based on that, we're adding the open certified data scientist today. And similar to architects and the tech specialist, there are three levels of certification that are being announced. And the certified level, the master certified data scientist and the distinguished data scientist. I'm going to give you a little bit of background here. So, the very exciting part is that it is really being done in a step-wise milestone approach. So, it's not a certification that you have to work away for years and then put together a huge package at the end. This is breaking it down into significant milestones that need to be attained. And then it makes the review process for both the reviewers and the data scientists much better at the end. So, a candidate submits total of five application forms to obtain these milestone badges. And it can be in any order if that really doesn't matter. But one of the first to talk about here is professional communication. We've both, you know, we've heard from Martin and George how important it is to, you know, have that business acumen to really communicate and tell the story. You know, if you've just come up with brilliant insights but you can't communicate that, you can't translate into the right visualization, you're not going to convince anybody to change what they're doing or change their approach. So, there are guidelines and standards to meet for professional communication. Likewise, as most professions, you want people to stay current to be up to date on the latest tools, technologies and approaches. So, professional development is a key part to keep those skills fresh. And then, kind of the heart and soul, the way I see it is on the certification, this is really built on experience. So, this is when you've got, you know, data scientists who has worked on a project and if it's the certified level, that means that they, you know, may have been supervised by other data scientists. That's the initial level. Level two is when the data scientist is really more leading the projects. And then the other, the distinguished data scientist is really kind of the thought leaders and they're having a much broader impact. So, that kind of differentiates the levels. And so, the experience profiles are really the heart and soul of this. This is showing that you have tackled a business problem and solved it and you can demonstrate the value that that brought. These are then reviewed each step of the way by subject matter experts, you know, who will determine that it's met the requirements or not. Then, after those milestones have been met, is when the data scientist can submit a experience application form for certification. And then, the final step is in the review, going to the review board. So, the candidates evaluated with all of that submitted material and gets a decision. The board recommends certification if the majority agree that the candidate has met all of those requirements. So, that's a high level overview of the process. It's probably going to sound pretty familiar to architects and tech specialist. We took a lot of lessons learned from the other professions that are much more mature and have gone before us. And they've been very generous with those lessons learned. So, we really tried to design this so that people can get started. And one of the key things we really want to do is we look at this as just the beginning. You know, we want to grow this profession. We want to develop the body of knowledge that can be used across the profession and different globally, frankly. We want to also do outreach to universities. So, a lot of the universities are now just starting to graduate the first groups that have masters in analytics, data science, artificial intelligence, machine learning, all of these great areas. And they're doing projects, capstone projects, as part of those university programs. And they do it with real data, real clients, solving real problems. So, that could qualify for one of those project profiles and get them well on their path to becoming certified. So, that's one way we look at trying to grow the profession. Enterprises can become accredited, use all the standards that have been created here and become accredited so that they can run their own program. IBM just recently went through the accreditation process with data science after this work and we're thrilled to get that. So, we've been building a version of that to do that and process the data scientists within the company. Other companies are welcome to do that and tap into this, I think, terrific work that's been accomplished here. And now I'm going to do a little commercial because we, as I said, really want to grow this profession. We are very interested in a data scientist forum and so we have a workshop tomorrow afternoon which you are certainly all invited to. We have a little fun thing to start with who wants to be a millionaire data scientist game. And then we have Arizona State University who's located nearby. They've had a couple of those projects that I mentioned, those capstone type projects with two clients that they're going to have joined tomorrow, APL Logistics and the Arizona Lottery and they're going to talk about that. And then Anna Etcheverry is going to take us through the data scientist competency model which I think people will find really interesting and very useful. So, with that, I think I will turn it back to you, Steve. Okay, thank you all three for giving us your insights there. We're obviously going to have somebody else join the panel who isn't going to give a presentation but we'll be able to add a lot of insight to this and that's my colleague in the open group, James Derave. So, James, if you could start taking a seat. James is now the VP and General Manager for our India operation. Welcome, James. Please take a seat. But it has a long history of working on certification within the open group. In fact, way back to the early predating the single unit specification. So, a long time doing that and he is the open group staff person responsible for the open professions work group that we have that runs the programs that Maureen showed on her chart for the architects and the technical specialists and now the data scientists. So, welcome, James. So, we have questions coming in on Slido. And I'd just like to start with not a surprising one from this audience. A lot of interest in architecture here. The question reads, how do you see the role of enterprise architect and data scientist merging? I might change that slightly. Not my question, but I might change it slightly. Do you see that role? Enterprise architect and data scientist. Is that one that who feels comfortable to take? Maybe, Martin. Sure. I can see you're looking at me to do this. So, I'm not sure about the merging point, but certainly working very closely together. The advent of much more sophisticated data platforms. We think of the Watson data platform. We have a version of that that we've deployed internally. But being able to bring all the disparate data sources together, and George certainly gave examples of many of them, but of course there are others. And not only to do the engineering, which is an important piece of it, but also to do the data governance, which turns out in our experience to be among the most important and most difficult challenges that we face. So the data engineering becomes important as the cloud architecture begins to become increasingly available and sophisticated to have the compute, the storage, the access to the APIs. All in one location to deal with latency and bandwidth issues is a real challenge, but it's something that the data scientist and the architects have to be able to come together on. And then also to also work together on the issues around governance. One trusted source of data. To be able to be certain that when those data are ingested, there are not transformations that are occurring that might be creating inconsistencies. To be able to have the appropriate set of taxonomies so that as those data are being viewed, consistently across the organization. So a very significant set of issues where both teams have to be able to work really as one. Thank you. We have questions coming in, so unless anyone is desperate to add anything, I'll move on to the next question. Have any of you put together applications for the certification program yet? Yes. Yes, as part of building the system for IBM data scientist, we've created applications and I've worked with Christina and other people on the open group to once all the conformance requirements were defined, that we could then look at the application process and make sure that we're covering everything that we have deemed a conformance requirement in those applications. So you've kind of been on guinea pigs. Yes. Good. James, maybe one for you. Maureen touched on it in her presentation and described it as exciting which it is, the step-wise nature of the certification program which is a little different to what we have for the other two profession programs. Can you say a bit more about what's different and why? Some value of exciting for those interested in certification. It actually is beyond that. For those who are familiar with the programs we've been running for the last several years now in the architecture space, there's a big sort of a logistics challenge for the individual wanting to get certified because the requirement is that they fill in a form, sounds okay, but the form ends up being about 55 pages long and can take 40, 50, 60 hours of time to actually complete properly. That's a substantial impediment. You've got to find the time. You've got to find the will, the motivation to actually get that done, and a lot of people do it. But it's also a barrier. So one of the things that we were very keen to do in the reworking of our professions program, which we've been able to pilot with the data science profession is to say, well, how can we partition this into more easily digestible, more bite-sized pieces? So that's why we said, well, we've got to break it up into milestones. And then the task was, well, how do we do that without in any way compromising the overall quality and integrity of what it means to become certified? So that's what we think we've achieved. We're very confident we've achieved it. So data science has said it's the pioneer program for this approach, but we are well down the path of the definition and implementation phases for introducing this into the architect profession and into the technical specialist profession and other professions that we're also working on in parallel. As I think you may remember under a circle, yesterday mentioned that the supply chain activity within the Trusted Technology Provider Forum is looking at an experience-based program. So that will use the same milestone-based approach. It's what we're going to do for all of our experience-based certification programs now. They all come out. I'll just hold in there and you'll get the benefits. That was going to be the second part of my question if you're doing it for this. Shouldn't we be doing it for the others? So it's good to hear it's well underway. George, I know when we last met, you gave us a presentation that involved references to things like moneyball and other things. I know you've told me that professional sports is probably the biggest user of a consumer of the analytic data science and the analytics. Can you say a bit more about that? It's a topic that I know interests a lot of people. How do they use them? The book Moneyball and the movie Moneyball introduced data science to the public. Essentially what happened was the Oakland Athletics had a very fixed budget and they had to figure out how many runs it was going to take to win enough games in order to win their division and get into the playoffs. On the budget that they had, professional baseball has always been very much a skills-based and a reputation-based industry, but these guys turned it into economics and they turned it into a data science project where they had a goal of scoring so many runs in a year to win so many games in order to win their division and they figured out how much money they had and they had to allocate that money to their positions. How much money are you going to pay a first baseman? How much money are you going to pay a shortstop? Where are they going to hit in the lineup? That was a data science problem and became how now major league sports baseball, football. In fact, I was telling Marine that there was an ad just about a month ago for a chief data scientist for the Denver Broncos. They've had two losing seasons in a row. They're struggling at the quarterback position. They're struggling on the offensive line. They're now looking for new ways to apply data science to their budget to feel the winning football team. Professional sports has really caught on to the notion of data scientist. Thank you. Next question. Adding consistent metadata to the data would be beneficial to the data science analysis effort. The open group ODEF standard should be considered here. Is that something you've taken into account yet or about to? That's certainly something that Marine and I have talked about for the workshop tomorrow in building the community and how to use these resources to grow the community and integrate more closely. I would also add that metadata is a place where we've struggled quite a bit. The large organization, many disparate and diverse data sources. Anything like the kinds of standards that you all are helping to create can be applied, I think, very productively. Particularly in larger organizations where data just seems to appear out of everywhere and becomes a management and governance challenge. I think that really is one of the exciting parts about being here and being able to work with the other parts of the open group and to be able to find those synergies and really be able to build on them which I think will help all the professions. You've talked about the workshop tomorrow and you've emphasized more and this is the start of the certification program and we need to build on that. Things like where do the other standards of the open group play, where do the other standards from outside the open group play build a community here at the open group. All looking for, I think. Good, good, good. A question about the program itself. How far back can you go when using experience for the certifications or does every example have to start from today going forward? I'll try and remember. I'll help you. Excellent, but between the two of us we'll probably get the right answers. This is an important distinction, I think, between the professions programs and the knowledge-based exam programs. You take a togaf or a similar skills-based program which involves training courses, exams, and certificates as a result of passing an exam. Yes, it's a statement about the knowledge and understanding that that individual has, but in a sense it's about what they're... It's about something new for them. It's about new knowledge. It's a forward-looking thing. Now, these professions programs based on skills and experience, it's not so much that they look back, but they look at what you have achieved as being the indicator of what you can achieve next. So yes, the program allows... It is all about experience you have demonstrated, skills you have demonstrated in projects to achieve results. So yes, we've got timeline rules in the program. You must have an experience which is in the last three years in order to get certified, and you can't cite anything older than eight years. I think I've got those numbers right. But it's really critical to understand that it's based on a clear description and interviews and discussions about what you have done. So it's about proven experience, proven skill, proven ability to deliver results. It's eight years, right? It's eight years is the length of the window. If you look at the milestones, that milestone approach, one of those experienced milestone badges has to be about a project that was completed within the last three years. So it has to be current. You can't get data science certification if you gave up practicing five years ago. It won't work. It has to be current. Right, and like architects and tech specialists, there will be a recertification required every three years. And again, that's to make sure that there are practitioners and keeping their skills current and continuing to drive results. A very related question. If I'm an aspiring data scientist, how do I start working towards my certification? An inevitable question, I think. That's a great question. Well, if you would go directly to the opengroup.org website, which now has the information on how to get started on the data scientist profession. I guess it would really depend on where they are in that aspiration, right? I mean, if there are somebody that are really just thinking about how do I gain the skills to become a data scientist? There are a lot of university programs now that have instituted masters in analytics or data science. You don't have to go to one of those to become a data scientist. People come at it from statistics background, economists, and other ways. What I really like about the certification is how you have come to have those skills. We don't dictate how you got those skills. This is experience-based, so you could have been really enthusiastic and taken a lot of MOOCs and just been coding because you love it for so long. After solving problems and being able to meet the conformance requirements, it doesn't matter how you came to those skills, but there is a vast array of ways you can do that. Okay. Just to add there, Maureen, you mentioned going to look at the documentation on the opengroup site, and that is so important because that documentation is the definition of the profession as agreed by the opengroup membership. You'll read the conformance requirements. They describe in very clear language with rationale as a skill that we believe a data scientist should have. If you're aspiring to become a data scientist, those descriptions of the skills that you need to become a professional data scientist, they define the things that you need to build into your work. They're the things you need to look for in yourself as to whether, you know, do I have this skill and if I don't have this skill, how can I acquire it or if I think I have the skill, how can I demonstrate it on my next project? The other really useful piece is because we split this program up into the milestones, they've got separate application forms. It's a bit dry. It's an application form from an experienced milestone. You read the questions. They tell you the things that are important from a data science perspective about a data science project. So it's a really good set of signposts to what you need to do, what you need to think about, what skills you need to acquire, develop, enhance in order to meet the standard of becoming a professional data scientist. One of the other things we did with NIVM is we created success profiles and we did some research with some of our most successful data scientists at both a junior and senior level and really took a look at what made them successful. We found everything from really personal attributes, just curious. These are data scientists and scientists are inherently curious and they like to explore and figure things out, problem solving skills. So looking at that is also a way where people can figure out if they are suited for it. Something you've touched on just now, Maureen, but what kind of backgrounds do you typically see for the data scientists that you have at IBM? So this is, we're now into the challenge of recruiting and hiring. So there are a range of challenges that we face. Certainly for those at more of an entry level, let me address every level first. We've had a lot of good success with folks coming out of MBA programs because they have a combination of the analytic skills as well as the business acumen. In and of itself, that's not necessarily enough, but that certainly is one profile of the kind of folks that we're looking for. Maureen referenced in her remarks, there are now a number of universities that have started master's programs in data science. And I think we just recently hired our first few folks who completed such programs. So the university community is beginning to help us in that respect. We've also had very fruitful activity recruiting folks out of PhD programs across a variety of different fields. So my favorite example is a young woman actually, so this will be amazing. A young woman joined us who completed a PhD in astrophysics which is a unique combination, but she's a brilliant young person and has an enormous amount of experience dealing with quite large data sets because of the astrophysics background. So we have a number of a large number of folks with completed PhD programs in economics and sociology and physics and computer science and astrophysics and a number of different fields. But we found that to be helpful just because of the need for the talent and the intensity of the competition. Another place where we've looked, we've started what I think of as a bit of a minor league program and we have a large summer internship program, but one of the groups of interns are rising juniors and undergraduate programs who then over the course of a couple of summers can join us and come along and we've had success adding folks out of undergraduate programs as well. Now they're younger so they can do less, but nonetheless it helps to keep the team in good balance and be able to have folks in a number of different areas. From a more experience a more senior level, certainly there are a large number of folks who have experience in computer science in finance and in other fields who over time become knowledgeable about data science and all of the kinds of issues and challenges and struggles that we face and so they in turn grow into the profession and into the roles. I mean those are also very useful sets of experiences to have because much of what we do is around the business side and trying to get organizations to change behavior. So in any event that's where we've had some success. Thank you. This one's probably for you George I think. Anyone else is welcome to chip into what tools do data scientists use in their daily work and how does the data scientist use agile development and methodologies in this profession? So tools are pretty data, you know personal but in general it's Python, R SPSS, SAS I mean those are like the big four that come to mind. The crisp DM process is agile by nature it's very iterative the things that I had on my slide business understanding, data understanding data cleaning modeling, validation and storytelling those are the steps but those don't happen in a straight line those happen very iteratively and they happen very quickly with each other and so that process that the data scientists use is agile the milestones, the stories the hills, the valleys it's very agile. So I think we've seen to complement what George is saying now recently and we're talking recently within months certainly within the past year the advent of notebook type tools like Watson Studio and each one Microsoft and Amazon all of the vendors have some sort of notebook like tool that has become increasingly useful to bring together the SPSS and the Python type capabilities to be able to locate the code in one place from my point of view somebody has to lead a large team of data scientists it adds to the productivity and the efficiency of the team where folks don't have to go searching now you could use get hub there are other approaches but it's one way of keeping the teams organized so the Watson Studio type tools have become very important also to what George, the point George is making the agile method if you come to our and everybody's welcome to come and visit us in our mark there are teams of folks standing around having their daily stand ups and going through a very organized approach with as George says hills and mountains and deliverables and they've got the squads and all the rest in the retrospective so just like software development team we adhere to a well organized agile approach in our group we use cupcake, birthday cake wedding cake as the hills right okay, thank you through building the certification program and collaborating with others did Maureen or George learn anything new about the profession of data scientists Thomas we did we learned about all the different perspectives that people had about data science and I thought the collaborative process was just fantastic to go through because we thought we had it all figured out and then we started working with other companies who brought just a different perspective to it and then we had a lot of debates I'm trying to think if there was one thing we might have the big thing for me was the distinction between consulting data science and operational data science and so we spent a lot of time trying to figure out in the profession when you're a consulting data scientist basically your final output is a presentation with a set of results and recommendations as opposed to when you're an operational data scientist your final product is a set of code that you're integrating into a legacy system that's helping to make decisions on a regular basis and so the certification had to cover both kinds of data scientists and the original one that we did didn't and so a lot of the debate and discussion was okay how do we fit this for both sorts of professional people I know when we've done these things in the past what we tend to find is there's considerable overlap in the perspectives and the way that organisations have their own professions organised but differences and if we can kind of pick the good bits from everyone then we get a better program absolutely, you can look at what you started with and then where you ended after you have it vetted with so many different people and those perspectives and it's a much better product after it goes through that process absolutely, that's great how can we grow the need for additional data scientists within my organisation how do we make the business case for joining the open group data scientists forum and certification so this was a challenge that we faced a number of years ago it's probably even to some degree before I took on the role but you have to find opportunities to begin a few small initial pilot projects I might call them to be able to demonstrate some value it's the value you can talk theoretically about data, data scientists the opportunity to improve performance but until you really find a business leader who understands and is willing to have enough faith to support a small team to go off launch into a piece of work and to demonstrate where the value is and how that that set of the set of analytics is going to be translated into business actions improvements and value delivery in terms of improved revenue and profitability and if it's in a governmental organisation other performance metrics you know we often times get into discussions and it's easy for you guys to do that you have all this data our data are a mess our data are a mess too and the challenge is to be able to work along parallel paths be able to bring sufficient data together and to be clever enough to identify a spot where there are data and where there are meaningful business problems that can begin initially to demonstrate some real initial value and then build on that success and go over time you know we've had a couple of conversations with teams that have said you know we just went out and literally had a conversation just went out and hired five data sciences what should I do now my answer is that's wrong you just made your first mistake first thing is to identify the business problem in the executive who wants it solved it's an almost identical answer to an almost identical question that I've asked in events like this about enterprise architecture how do you make the case for enterprise architecture and it's a very similar answer you've got to show value early you've got to demonstrate that there's something here absolutely good okay how many data scientists does IBM have and is this profession a strategic goal within IBM and its customers so that's a question that Maureen worked a lot on it hasn't been a profession so the true answer is we don't know our best estimate is out of a little bit less than there's a little bit more than 350,000 employees in IBM it's about 15,000 data scientists is the estimate that we have spread across many different business units many different geographies in many different countries in many different locations and yes it is a very important strategic role and a place where we want to grow our talent and our skill okay thank you James do you want to say a little bit more about what makes up the open data science profession standard I mean I think Maureen showed a build slide but how would you describe it the standard itself it's built into some fairly clear and obvious sections we have a section on core basic skills one of the interesting outcomes from our rework and over the last couple of years has been reaching the conclusion that across the technical specialists, the architects and the data scientists these core basic skills of professionals are the same not just similar, they're the same so the wording is identical and those cover the basic functionings of a professional things like the professional communication for example conflict resolution skills so there's a whole set of them which we believe are the same for professionals in this industry then we have obviously profession specific skills things that are particular to data scientists or to an application developer or to an enterprise architect and then we have experience requirements which are also particular to the particular professions because they're focused on the nature of the projects that people undertake within those professions so we start with a look at the life cycle what does my day look like or what does a project look like for a data scientist what does a project look like for an enterprise architect so we describe that and we fit in the construction forms we fit in the questions about demonstration of those particular skills within the context of a project of a piece of work so we've sort of structured it to make it a lot easier for an individual to think about how they've demonstrated a skill so if you think about this instance of data science project and you look at the application form for the experience milestones it'll help you think through what did you do it'll ask you what were key decisions that you made what were they and you've got to think about what were those key decisions and the ability to be able to articulate those describe them and talk about them is part of the process of applying so that's kind of the structure of it and why we think we've got something really quite useful here the other point I think that we haven't mentioned Maureen mentioned that IBM has achieved accreditation and is he've assessed the IBM program the way it's operating the way it's constructed so that they can operate that program internally and communicate certified people with us when they've reached the milestones when they've reached the end certification point and that's something across the whole piece that's for the specialists and the architects as well that's available but if you're thinking as an organisation about building a data science profession about in some way regularising it building it into your career model the way your HR thinks about hiring people and developing and promoting them and getting the best out of them in this space the material that we published is a best practice from the industry that's been reviewed by all open group members about what are the important skills and experiences for a data scientist it's a really useful starting point in helping to capture that within your organisation I think it's inevitable that organisations are going to have special additional requirements for what they want to do within their profession which they can then add to that common corpus so it's actually a very, very valuable accessible resource for all of our members and the public these things are published it's open for anyone to fetch and download OK, thank you what's one piece of advice you would give to folks considering a career in data science? Do it If they've got the desire and that curiosity and they're quantitatively inclined and like to really solve problems I think as a career path it's got just so much promise the technology will continue to evolve and change so we have certainly machine learning and neural networks and other artificial intelligence technologies the path looks very exciting going forward so I would definitely encourage people who have those kind of core qualities I think they could be pretty happy in a career as a data scientist it's interesting I had a I had an Uber driver tell me recently that he was he was doing that but he was also in a job that wasn't necessarily everything he wanted it to be but he really had a passion for data science and he thought he that's where he belonged but when he looked at how to get into it it was very very difficult and there were boot camps he could go on but they were very expensive and he didn't have the money to do those and I told him to watch this space for a possible way of doing that without that but to start with some of the companies that were looking it's a very much in demand skill and you never know where you can get in but I think he was just one conversation but it was very much I don't think I can make that leap without having done something to show I can be a data scientist and I said I think probably it doesn't matter what your background is or anything if you demonstrate the right attributes then you got a chance but anyone else want to add to what Maureen said I'm going to echo Maureen do it I've been a data scientist for going on 35 years now and the beauty of it is that I've solved problems for the Department of Defense I worked at NASA for a while I've spent the last 20 years at IBM working with banks with food service companies with all sorts of unique industries and they've all wanted me to help them solve business problems and that's really cool that you're not doing one thing for your entire life and you're meeting all these people across all these industries it's really fun so the other point that I would make and hasn't arisen yet so far in the conversation is ultimately to achieve success from my point of view as somebody that has to lead a team we have to be able to change behavior it's not just creating algorithms or deploying algorithms or making algorithms available it's actually putting them into practice and changing the behavior of an organization and the problem you're aren't into is you have to deal with people if we didn't have to deal with people it would be much better off but it's the change management that really becomes the most difficult the resistance that we all naturally have whether you're in an organization or not nobody likes to have to change but we have to that's the reality of where we are so my advice would be to understand that doing the analytics deploying the solution is part of the challenge the other part of the challenge is seeing it through to completion and actually seeing the change to behavior and the changed outcomes that's where it gets very creative and innovative and how you get people to change and along those lines you really can't care who gets the credit there's an amazing piece in the movie Money Ball where our hero Brad Pitt and our data scientist Jonah Hill are getting pushback a basic blocked by Art How who was the manager and Art How owned the lineup card and he wouldn't play the guys that the data scientist wanted them to play in order to achieve the goals and so they had to work their way around him but in the end it worked and the A's won their division they actually set the record for the longest win streak in Major League Baseball but at the end what you hear is the announcers on TV and everybody saying what a great job Art How did with this group of misfits so even though he was the roadblock he got all the credit for the success and the owner and the data scientist they didn't care they were successful and they changed the culture of the organization they were successful by causing the change yeah absolutely okay do you work with other IT relationship management teams to support the business or are you providing the data science service largely independently oh god no the way we're set up crafts could be a little bit different than other organizations we have a CIO technology infrastructure of course given the size of our organization we've also split the roles between a chief data officer and a chief analytics officer role I have the chief analytics officer role which is very different than the chief data officer role he is very much focused on the tasks I referenced earlier the data engineering and the data governance now there are many organizations that integrate the tool in the financial services sector because of the regulatory changes after the great recession in 2008-2009 many financial services organizations created chief data officers and then of course the senior leaders learned that we're collecting vast amounts of data for the regulators maybe we can get some business value out of this as well so many of those teams added the analytics capabilities to them so it's going to differ a little bit across organizations but really those three functional areas infrastructure data and analytics have to be integrated in some form or fashion and we do it by working together as peers ok thank you have you noticed any connection between the growth of IOT and the demand for data science in other words is the volume of data from IOT driving the demand I would say definitely you must be seeing a lot of those that's what the IOT the internet of things is what's really driving the collection and access to so much more data than we've ever had before so I think they're directly correlated well we had a great example with Watson there's a bike race a bicycle race every year that starts in Long Beach California and goes to Manhattan Beach in New York and we instrumented one of the bikes and one of the riders and he's actually owned a bike shop he wasn't a professional cyclist everyone else in this race is a professional cyclist but he was in great shape he was 45 year old guy everybody else in their 20s and we instrumented him with the internet of things and he sends a data and based on Watson we were telling him when he needed a rest when his performance was declined and when he had to go for it and he actually took second in the race and it was because of the data science and the internet of things and the amount of data that we collected that drove him and helped him achieve his goal of actually his goal was to finish but he actually took second which was amazing what can we do for an organization exactly okay thank you next question probably triggered by something you said James in describing the commonality between the professions this question is so certain badges earned can be applied to multiple professional certifications eg a core skill badge can be applied to data scientist and enterprise architect or not quite we did look very hard at that question it would be really nice if we could have the badges that were the same but the conclusion was that the experience demonstrating the skills and the requirements to achieve a badge in one profession need to be from the practice of that profession so for example if you look at the professional communications badge the requirements are the same but it must be in the exercise of your profession as a data scientist or as a technical specialist or as an architect because we feel the nature of those communications are different so just because you can talk very lucidly about data engineering or configuring SAP to do a particular role doesn't mean you can communicate well as a data scientist so we have kept them distinct I think the chances are that should you apply for a badge in another profession it's not going to be difficult to achieve if you're now working in that new profession so we don't think that decision is a practical barrier of any way we think it actually reflects correctly the way in which professionals operate and think and behave in their working lives OK thank you Warrin you talked about certain universities being in fact Martin did as well in universities being offering courses now and that's one way one channel for recruitment in terms of outreach for this program now kind of advice to the open group and the members where might be the obvious targets to focus for this program I mean universities might be one clearly we want other open group members to get involved are there any other things any other targets we should be thinking about that you think might be rich I think any enterprise maybe some that are not open group members yet but could really value from that obviously would be welcomed with open arms to join the open group and be able to advance the profession so universities is certainly a key part other enterprises across industry by the way we see a lot of interest from certain industries in data science but the value really can be driven in any industry and organization so we would be looking at that as well we've got a brand new forum in the open group the open subsurface data universe forum which is basically about creating a cloud based platform for analyzing the data that the oil and gas operators are getting from exploration and they have a very keen interest in data science and this is going to be a great interest to them I think so it's just picking one industry or two industries I guess that's a great internet of things application for oil and gas exploration and production and I would think you would want government agencies to be participating public policy and deciding what kind of sensors there's a lot of data right absolutely and as with all these certification programs that we run when customers are asking for it that's when there's a lot of interest and that's the pull through for people to take it seriously and maybe get accredited okay a response to the answer you just gave by the look of it could you wear multiple rolls roll hats within a single project maybe that's not a I'm sure you could from the perspective of a certification program we're looking at evidence of achievement of the requirements and if you're working within a project and have multiple hats in it there's no reason why you couldn't be simultaneously qualifying for two different things but I do but I see no reason why not I don't see any particular obstacle you just have to convince the professionals who are your peer reviewers at the end of the process that you've actually done the stuff as a data scientist as a technical specialist so the interview process isn't on trivial but if you've been doing it then making the case isn't going to be a problem okay two more questions last chance if you have one two more to come and this is about the community aspect so again another plug for the I'm looking that way the slides obviously not there but another plug for the workshop tomorrow as the start of this we're looking now at this point we have something to build on something to discuss how do we evolve the profession how do we over time evolve the certification program as we learn more the question is how do we get involved was the question but I guess that's as much an open group question as anything else but the answer is please do get involved the workshop is there tomorrow and then look out for more things that we might be offering in that space anything to add to that I don't want to steal your question but that's better we welcome the involvement we really hope we get a good turnout tomorrow but it's going to go beyond well beyond tomorrow and the workshop so we have a group that is going I would say reach out to any of us in terms of wanting to really get more involved and we will definitely engage okay final question then before we go into the break save this one to last because it seemed appropriate how many data scientists does it take to screw in a light bulb we don't have any hardware guys sounds like a joke I don't know if anyone has an amusing answer but it's an architect problem it's a good answer one fewer than it would take architects is another answer okay we'll wrap it up there thank you all for participating for your insights and for taking the questions and thank you all for considering all set of questions and your interest and if you have further interest then tomorrow is the time to go to the workshop