 It's my pleasure to welcome three key people from IBM who are running this event today. First we have Maureen Norton who is co-author of Analytics Across the Enterprise, how IBM realizes business value from big data and analytics, and Maureen is a distinguished market intelligence professional in the Chief Analytics Office at IBM. Currently she's the Global Data Scientist Professional Lead helping to grow the skills and expertise of data scientists. Niraj Madan is an IBM Data Scientist with over 15 years of experience in data science and strategy consulting. He's currently leading the data science practice for IBM Cloud Client Experience. In this role he is a data science advisor to the executives in creating and executing the machine learning and AI roadmap towards realizing the business goals and outcomes successfully. And finally we have Upka Lida, Upka is an IBM Data Scientist and AI developer advocate with 16 years of experience in IT development including team management, functional and technical leadership roles with a deep experience in full stack technology. So with that I'll hand over to Maureen. Thank you. Okay, thank you very much John and welcome everyone. We're delighted to have you here today for our data science workshop. Just one thing I wanted to mention if you are just joined is in the chat if you can select the send to all participants we want to make sure that all of the participants today can engage in the chat and make it a more interactive session. If we could go to the next chart please. So just a little bit about the workshop and what we're going to be doing today. We really are intending this to be an experiential journey so very hands on we have some upfront presentation part we're going to talk about data provide perhaps some use cases that can help you think a little bit differently about data and the different methodologies for data science that we'll be covering. The intent of this is really to shine a light on some of these methodologies and ideas so that you can come away from the session with a plan of your own on a project that you might want to recommend or lead for your own business. So we're going to have exercises throughout where we really try to have you just think about what your world is and what you might want to accomplish with this. And then we'll take you through an example of a common use case improving customer experiences with real time insight. And that looks at things like net promoter scores because we thought that's common across industries where people are trying to make sure that they are improving their client experiences as they go. Again that's the example we're going to use but at the same time we want you to be thinking about what you might want to do in your own world. So there was some pre workshop setup instructions that were sent hopefully everybody got those and had a chance to do some of the setup but as we get into this will help you through. We have in addition to myself and Neeraj and Upcar four other IBMers joining us today to help with all of those logistics and your questions. And so they're going to be working somewhat behind the scenes but I'd like to right now ask them if they could introduce themselves. So what could you start get us started here. Sure. Hello everybody. My name is my Collinda Malera. I'm a data scientist and IBM cloud client experience and I'm looking forward to helping you guys follow our use case in this session. I hope you guys learn a lot. Thank you. David. Good morning everyone. My name is David. I'm a technical support developer IBM and help with the buildings notebook looking forward to helping you guys. Realize your goals here. Thanks. Mariela. Hi everyone. This is Mariela and I am a technical support developer here in Dallas and I'm excited to be here to help you. Thank you. Great. And finally Matt. Hello everyone. My name is Matthew Rogers. I'm also a technical support developer and I'm glad to be here. Great. Well, we're really glad to have you. So we've got a lot of folks who are enthusiastic to help you. You know, as we go through the entire workshop today. It's noted in central time but just getting started. And so for most of the to this hour we're going to go through kind of a session introduction expectation setting. That quick overview I mentioned in terms of a introduction to data science and some of the active analytics and machine learning solutions. And then really start to get into create a project so we get into the very hands on portion of it. And then the rest of the time we'll take a short break. And then then we'll really go into the hands on portion of this where you're going to go through each of the steps. And I think you're going to find it very interesting. We have gotten feedback from earlier sessions incorporated everything that was recommended. So we're excited to start this with you and welcome your suggestions as well. First, we wanted to start out like let's talk about data and start to get into the right mindset for creative uses of data and sources of data. So just to get things started and feel free to join in the chat. Try to remember to send it to all participants. But think about sources of data that might fit this bill that they would have information about pretty much any topic anywhere in the world at any given time. So if you can think of a source like that and want to put it in the chat go right ahead. You may actually have a few different sources, but I'm going to keep going in the presentation and let you know about the one that comes to mind when I think about a ubiquitous source of data like that. So Twitter certainly comes to mind as one of those platforms that has information about just about any subject you can think about from any part of the world at any given moment. It's really been quite a story. Back in 2010, Twitter was making money pretty much the way most internet companies with free services do by selling advertising. But then in 2014, Adam Bain, Twitter's president of global revenue, launched two new businesses for it. E-commerce, the buy button, if you will, as well as selling access to the data. And monetizing data has become a very interesting part of their business. And at that point in 2014, it looked very promising. And Twitter was thinking like we have information that can help any business anywhere improve their outcomes. Certainly they could do research on customer sentiment about their product because people would tweet about all kinds of products. And so this was a very exciting part. And he really proclaimed like it could help any business with Twitter data until one day the maker of CEO of a maker of $50,000 industrial deep fryers came in wanting to license Twitter data. And deep fryers meaning like if you went into a fast food restaurant, these are the industrial fryers that are making the French fries or fried foods of all sorts. And suddenly he was thinking like I might have just met the first business that I actually can't help because people are not going to tweet about an industrial fryer that happened to be in a restaurant they were at. So he said, you know, I really wish I could help you, but I don't think I have any data that can really help you with this. And he felt somewhat humbled by that because he thought it could help everybody. But fortunately, the CEO of these industrial friars was thinking very creatively and he knew there was data there. And he said, well, people may not be tweeting about the friars. They most certainly do tweet about soggy fries. I don't know if any of you on this call have ever tweeted about getting soggy fries. But at the time if you did, you probably never realized that that data and your tweet could be monetized to help a business. So it turns out when there are tweets about soggy fries that this CEO knew that the company could use information about that and geospatial information to figure out where those tweets were coming from. And soggy fries are an indicator that either the industrial fryer needs to be serviced or it needs to be replaced. So he could tell by using this data if his machines, his friars were in these locations where there was getting to be a lot of complaints about soggy fries. And then he could take corrective action and show that we have to service this because these are the client comments you're getting about your fries. The other thing was, which is also a competitive advantage, if he discovered through this Twitter data about soggy fries that those were coming from areas where he didn't have his friars installed, he could use that as kind of lead generation to go in there and beat the company and sell a fryer that would improve the quality of the product. So again, it's very interesting when you think about sources of data and you may even be able to use negative data to really improve your product. There was also an interesting story about a woman who was putting together a birthday party for her mother and was going to do a fancy dinner and her mother said, no, I don't want anything fancy, just a KFC meal, Kentucky Fried Chicken type of meal. So they got the meal, had the birthday party and pretty much the whole family agreed. Like everything was good except the fries. They were pretty terrible. And they sent a tweet to that effect and it went viral. And then a lot of people started piling on to the point where someone tweeted, dear KFC, no one likes your fries. Yours sincerely, the entire world. And so that was quite a emotional dagger was what they did was they just took a matter of moments to write. They lingered long and heavy in their hearts. And they said, instead of crying into our soggy fries, we decided to take action radical potato fueled action. And they actually used that and responded by changing their recipe and keeping the skins on the potatoes before they fried them. And they really made a huge change. And then they responded on Twitter, forwarding all those negative ones and said, you told us no one likes our fries, so new ones are coming. Yours sincerely, KFC. And they sent this to dear entire world. So they took something that was really kind of a negative with this data and ended up, you know, using it in their own marketing campaigns. And I thought both of those stories really kind of illustrate the power of data that you may have to think about a little differently to see where the value is that it could drive for some of your projects. And then just what other types of data do you use or think could be used to drive deeper insights? Is there another type of data that has a, you know, similar broad appeal that could be used for all industries? Any, any guesses in the chat are certainly welcome. People start putting anything in there. And in the interest of time, I'm going to keep on going. So one of the other ones that I wanted to just mention as, again, these sources of data. Oh, IOT. That's a good one. The Internet of Things is producing a tremendous amount of data that can be levered in a lot of creative ways. Martin says, list of reports made available by the government about specific topics. Exactly. There is a wealth of information and a lot of free data sets that you can avail yourselves of. So those are, those are great examples. Feel free to keep putting things in the chat there. One of the other ones, data. Weather affects everything, both an essential resource and a significant risk factor for life on Earth. It's more weather is one of the most unpredictable aspects of our human existence. Yet we have no choice but to try our hand at predicting it. And after all our very lives certainly on it. And it has a serious effect on people's emotions and behavior. So from food and clothing choices to shelter after tornado alert, millions of decisions are made every day on weather. It's a really critical business such as guiding a route a pilot takes to avoid turbulence or when a farmer fertilizes their crops. Or how an energy company mobilizes its crews after a power outage. So we have so many examples of how people can use weather data and use it in combination with other data, you know, certainly climate data and such to really tackle problems that we were not able to see before. There was a hackathon a few years back and, you know, one of the teams was investigating the impact of weather and climate on animal migration in Kenya based on geospatial analysis. You know, another one developed a water disease prevention system for Kenya based on rainfall predictions. The systems predict the likelihood of diseases. Water shortage, probable floods. So there was an incredible amount of ways that you can come up with to use data very creatively. See a few more great examples in the chat there. Social media is a goldmine of information. Travel data is excellent example. And then the genes data to drive AI. So, all excellent examples and thank you for, you know, putting your, your thoughts in the chat there. Now turn it over to my colleague, Niraj to start taking us through some of the. Maybe considering for your example as we go forward, Niraj. Sure. Thank you, Marine. And yeah, so so far we looked at, I mean, all about data mindset, how we can use creativity to use the negative data to make some positive decisions. And now it opens up the door to understand that there are so many wide varieties of problems which are being solved using data science. And how can we broadly classify them or maybe segment them. And if you have to think about it, so most of the problems could broadly be put under these 4 categories and the 1st 1 being risk assessment. Where we all the time and in white scenarios, we've been, we've been trying to screen actors for threats. Now, when we talk about, you know, screening the actors for threats, the threats could be of any, any kind in any possible way it could come up like it could be on a credit credit transaction approvals. Anytime then when our credit card been swiped, it could be around a passenger screening. It could be about the loan approvals. It could be and the most common topics these days have been around the fake videos, the photos and also, I mean, it's all about news reviews. So all of these can fall under the bucket of screening actors for threats or even finding where the threats are there or not. So that's basically the 1st category where we could put all these kind of problems. Then the 2nd kind of problem statements could also go under quality and defects and they could fall under like, you know, when we talk about running looking at the server. I mean, whenever there's a server failure, what part was at part attributed to that failure? Was it a hard drive or any other any other aspects of the server or it could be around casting as you can see the car below. I mean, 1 of the model, you can see that how close is the cost new to the tires. So given the past failures or defects, is there a way to find out when these castings failures or defects can happen? So that's another another class of problem and even 18 machines failures, raw materials. A lot of these things can be can be kind of, you know, put under the segment of segment of quality and defect kind of data science models. The 3rd one is around business value and customer satisfaction and where we they've been times when we want to find out the most valuable customers and that could be based on multiple characteristics. And that's when we try to bucket them together and that could be 1 class of problem to find the value of the customer. And also they could be whether the customer is happy or not, you know, which is a promoter non promoter. It's all about value from the customer and looking at the satisfaction and the 4th type is around the price, the cost, the value and which were the perfect examples of those could be around. Let's say we we've been if you look at some real examples around us would be a Zillow or Redfin where we can see the house prices, the forecast of those. And I mean that those could be few examples and also like sometime if we are placing a product, a new product in the market. We could also look at some of the things like, you know, what is the right price for me to put a new product in the market. So these all would fall under under the price cost of value buckets. And this is how we would broadly classify and and the reason we are sharing these examples is that as we move forward, there will be some exercises which will need this amount of background. So we're just sharing you some examples for you. And if you have any more examples, you can think think about in your context. Feel free to share them in the chat for others to learn about. And what happens is see so in the data science journey of projects when you've been, you know, solving problems. So one, you understood some examples. Second, you got like under you got an understanding on what are the four broad categories of models or the problem types. And then the point is, okay, now you have your you you have a problem at your hand and now which technique you should be using. So this could be a very handy flow chart to keep it in your bucket all the time and see that, you know, you know, in your kit, I would say and see that which which path am I going to go. So how how we would generally start in data science is okay. The first thing we look at when a problem is given to us is like, okay, what is the outcome expected in the problem. Are we really trying to predict an amount or are we really trying to predict just the best solution. And given this context, like if we are trying to predict an amount if it's yes, then it would be so then it in this case it is like around forecasting, you know, could be techniques linear regression or decision and some of the example like in the last slide we talked about is around zero, you know, and, you know, and it could be around how much how much sales tax the government will make will collect this year. So those could be classified if it's a yes, but if it's no we are not trying to predict an amount and it's not a continuous value it's it's more of a categorical aspect we are trying to predict more like an event, like, and even that's what the workshop is on today like if the customer would be happy or not happy promoter or non promoter so it's if that is a kind of event we are trying to predict, then it's a yes so are we predicting an event yes and that's when we could use the models like logistic regression decision tree neural network. And if that's not the case, then we would go to clustering where we are trying to understand the value of the customers, or maybe you know, and you'll see a lot many more examples that could be the clustering. So how we would go through these factors I mean there is a when we are trying to predict an upon kind of an amount we could go the modeling aspects and the one I'm sharing, but if it's all about best solution kind of approach which is the down which is a part on the down over here. That's the case we would primarily look at more on the simulations models, and which I mentioned here, and, and I think so given the focus of the workshop is going to be part on the top right now so we'll actually stick to that. But this could be a very interesting kit for us to all to keep with us when we start looking at the problem and and understanding that which path to go and which algorithms to use in our context. So that's about choosing the right analytics approach. And, and this is this is another interesting framework that people be following so when we when you start coding. And I think we all been hearing terms like agile, you know, and been able to, because data science project sometimes can be very lengthy and I mean there is always a business looking for ongoing update on how your project is progressing. What are the milestones are you achieving and in these such scenarios because of cost pressures because of the governance pressure sometime, and we all fall like in a situation like okay I mean it's going to take me time I mean from start to end how do I tell. And when I join my sprints how do I upgrade something weekly what I'm doing. And that's where these kind of you know frameworks plays a very important role which you could maybe reuse for your own purposes. So this comes from tools to specific frameworks. One is from Chris DM, which is a very well known data mining methodology or a framework which is which stands for cross industry reference across industry standards for process for data mining and very widely accepted and used and been rated as one of the top data mining frameworks in Katie nuggets also. So that's the one on the part on the top. And the one below is actually from one of my, one of the favorite book that I have that I have is is is by Sebastian Reshka. And he talks about more on a simplified way that what parts goes in the model you know like when you go through the journey of building a machine learning system how you start with creating a path towards your journey. How you go about data, how you put algorithm retrain and finally you get a new data and you make a prediction. But for the workshop today we're going to be using Chris DM, and I wanted to give you two examples of it I mean how one can be more at a high level one can be more very specific. And the goal today would be that we would want all of us like in the in the in the workshop today to go through each phase or each pointer on these business business and data understanding data modeling evaluation and run the code yourself. Like we have a note supporting notebook and that has that is supporting each of these points which are highlighted here, and we would want you to kind of go through each each of these sections with us and then go back run the code. And so that's that's the goal here and and I think so one thing with this. So I think with this what happens is now, so given this framework we would be pretty much spending time. And if you see business and data understanding, we look at things like how when we start a project, how we assess the situation. What are the methodologies we consider? Do we look for some poster childs like benchmarks? Then finally, once we have once we're able to gauge that we look at the objectives. What are the objectives we are going to gain? What features are we going to consider? And once we have the data, the data quality plays a very important role. So we have examples of all of those and then there on it will be like preparation modeling and you'll go through them in detail for each of these sections. Now with this, also want to talk about so in the in the workshop today, you all would have got some pre work instructions on, you know, registering on IBM cloud Watson studio. And we are not comparing any solutions and it just that and we're not even saying which is best in this workshop. It just that we had access to Watson IBM Watson studio and it's it's it's free to use so we are going to be used this we are going to use IBM Watson studio in this workshop. And if you have any other preferences to use other solutions feel free the notebooks can be used across multiple I mean IDEs or you know the preferences you have. So that's about so for our workshop it's IBM Watson studio. And with this, I think there are there are two most important points I want to highlight and and okay, so there are two two most important points I want to highlight and and the one is like you know there are two skills that we all are going to be applying in this workshop. And the first one is around the hacking skills. And when I say the it's a hacking skill it really doesn't mean that we are going to be hacking. But but it is about like you know when when we run the notebooks we may fall and get into getting getting some errors. And then in that case I mean we will have to work together and kind of you know look at those errors and troubleshoot. And and I think the second thing is going to be teamwork so I mean yeah and these are like the two most important skills and data science. So if we've got if you're stuck somewhere we all will try and make it make all the attempts to help each other. So yeah so these are two important things hacking skills and teamwork we are going to all are going all are going to make our best to display these two skills to help each other and see that if you all can get through the work. A notebook and hands on experience and an example to make through this one and with this. Let's we can do just an audience poll. So Maggie I need your help with that. Yeah if you want to take a minute and think about it I mean so when you came when you attended this workshop or even signed up for this. I mean, did you have any. What was the thought process you had I mean and these are like few options and marine you want to add something to this one. In case anybody's having trouble in the poll it should be on the right hand side of your screen but have chat and other things open you might need to just close them to find it. But yes we just wanted to do some expectation setting and having fun with this polls. The question is through this workshop I wish a become a data scientist in the next 2 hours. Be understand the journey of a data science project. See effectively communicate and lead the data science teams more effectively or deep enough knowledge. So. And the poll is now running. Let your expectations are and we'll do our to meet them. It'll be how many people want to be dangerous though. How much time is left. Well. We've got like about 3 minutes so if you want to go with other things. This one was longer than expected I. Yeah. Can you end it to see the results. Yep. Yep. Okay. Why don't we do that. You want me to close it now. I think so. It doesn't take that long to selection. It's generating the report for the. Okay. Are you able to see the results. Yes. They just came. Okay. Yeah. So I think we got like 43% saying that. Understand the journey of a data science project and we have. 5% saying. And I think effectively communicate. And lead the projects is around. 10% and some 5% of them still want to be dangerous enough. Yeah. Great. Thank you. I think so. I think so. Thank you for. And I mean, yeah, pretty much I think most of you are right on. The expectation is that definitely I mean, if you've been coming with an expectation that. If you want to understand an end to end journey of a data science project, I think you're in the right place. And if you think. This could help you understand the data science work and be able to communicate with your, with your data science team more effectively. I think you're in the right place. And if any other options are there. Maybe you'll have to think again. And, but yeah, these are 2 main objectives that we would be addressing. And. One more thing I wanted to mention is that as we go along in this workshop. And thanks James for the feedback. And I think if anytime, I think we are going too fast or too slow. And if you think there could be, you know, if there is any feedback, because it's all your workshop. And I think we all have to get to it and make best use of the time we have. So if you have any other feedback suggestions. While we are even talking, yeah, feel free to send us your messages and we're reading them. So thank you, James for your message. And with this, I'll hand it over to up car. And he's going to help us with. Getting set up. And I'm going to give you the control of car. Yep. Let me know if you have the control. Thank you. Well, we're. I just want to. At. A. And we're. I'm sure everybody has the capability to send in the chat to all participants. Versus all attendees. Because the panelists may not get to the chat. So. If you have that all participants. Then we will be sure to get it. Otherwise, feel free to use the Q and A as we're going through this. Over to you up car. Okay. Perfect. Thank you. So as Neera has mentioned, this PDF was sent to you as pre work beforehand. So I'm hoping. I'm hoping. I'm hoping. I'm hoping. I'm hoping. I'm hoping. I'm hoping. I'm hoping. I'm hoping. I'm hoping. I'm hoping. I'm hoping. I'm hoping. I'm hoping. I'm hoping. I'm hoping. I'm hoping. I'm hoping. I'm hoping. I'm hoping. I'm hoping. A lot of you have finished this pre work. If not, not a problem. If this doesn't take too long, you can, you can do it. Today with us. So there's a question. I don't have the tools installed that as it's a company laptop. Will it run on the cloud? So yes, absolutely. Everything we do today, we'll run it on IBM cloud. link I provided in the chat, and Neeraj, if you can please supplement all of the links and everything as I talked through it, that'd be great. OK, so I won't take too long here. There are three or four goals, I guess, in this pre-setup thing. So first, to sign up for IBM Cloud, use the link I sent you. It's fairly easy. Lasky, for your first name, last name, email. And I'll send you an email to act with an activation code. You come back, put that in, and you're good to go. It is not going to ask you for any credit card information or anything like that. Can I just cut in here? Can you put your PowerPoint in slideshow mode? Because you've just got, we can see all of the slides. I think people will get a better image if they can just see the single slide at a time. Sure, this is a PDF file, but let me see if I can. OK. There you go. This might get a little better. Thank you. Sure. OK, so as I was saying, the three services we need for the session today are Watson Studio, Watson Machine Learning, and Cloud Object Store. So Cloud Object Store is where you store all of your data. If you're uploading any CSV file, Excel file, whatever data you have, you can use it to store that. The Watson Machine Learning is the service we'll use to deploy our model, to store our model, to deploy our model. And then finally, Watson Studio is the platform that brings all of this together. OK, so that's my very high level introduction. And now I'm going to actually show you how to create these services. So if you haven't done this already, you can follow along. I won't, again, I won't spend too much time on it. If you have questions, please ask us in chat. All right, so assuming you've already logged into or signed up for IBM Cloud and then logged in, this is what you'll see. This is our dashboard. It shows me what services I'm using, any communication from IBM to you as a user, all of that good stuff. So there are two things I want to show you here. One is the catalog where you can find all of the services that are offered on IBM Cloud. If you look at the left menu here, I'll go into the Services tab. And for our purposes today, I'll filter down to AI Machine Learning. And you can see there's quite a few services under the category of Artificial Intelligence and Machine Learning. The one we are interested in today is, of course, Watson Studio, as well as the Machine Learning Service. So this is one way to get into the right place, right? So it's okay if this is not working for you, if this is complicated, there's a really easy way to look up these services and you can use the search bar on top. If I go right on top, let's do the Machine Learning Service first, okay? So if you click on the search bar, type in Machine Learning, you'll see the catalog under Catalog Results, a Machine Learning Service showing up, all right? So you can go through the catalog or come through this way, no problem. Click on the Machine Learning Service. And the only thing, I leave everything as default. As you can see, it's already picked the light plan for me to just read. And then you come down here. If you want to, you can change the service name. I'll leave that as default as well. So I haven't changed anything on this page and I'll click on Create to create a Machine Learning Service. Now, if you have already created this before, you can only have one instance of the light service created at a time, okay? So that's one something to keep in mind. If you see an error message saying you already have a service, that's good. You don't have to create one. Neeraj was telling me yesterday, they have changed the plans a little bit. So if you create a brand new account, you might already get a Machine Learning Service and Watson's review service. If that's the case, that's good. You don't have to repeat the steps, okay? So we just created a Machine Learning Service. Let's go ahead and create Cloud Object Storage. And as you can see, I'm not going back to catalog or anything like that. I'll just go up to the search bar and type in Cloud Storage. So there's my object store or you can type in object store, I guess. So click on that. And again, we'll pick the free tier, the light tier, that won't cost us anything. And again, I'm going to leave everything as it is. I'm not even changing the name and I'll click on create. So there you go. We finished two out of the three. We finished Machine Learning, we finished Cloud Object Store. Again, if you see a message saying it's already there, you don't have to do this. How do you know what you have already created on IBM Cloud? You can actually go to this menu on the left and go to resource list. That's the place where you can see everything you have created so far. So you can see we just created the Machine Learning service, the Cloud Object Store service. And then finally, I'm going to create a Watson Studio service. So I'll type in Watson Studio, click on the service name. And again, I'll pick the free tier or the light tier and then I'll leave the name as default and click on create. That will create me a Watson Studio service. There you go. That was, it took us under three minutes to create all the three services. If you are having any problems, please let us know in the chat so we can help you out. And again, one place to look at what you've already got created is go to your resources page from IBM Cloud and here you'll see the three services created. So I've got Machine Learning, Watson Studio, Cloud Object Store. Okay. Neeraj, would you like me to go through a little bit of Watson Studio or do you want to do a poll and see how people are doing or do you want to take a break? Yeah, I mean, just if you want to give an introduction, maybe just a very high level and tell them, I mean, just to get everyone up to speed on that, that'll help for sure. Got it. So let's actually import the notebook and then we can take a break. Okay, so if I now want to use Watson Studio, again, you go to your resources, you find your Watson Studio service. By the way, at this point, if I type in Watson Studio on top, you'll see it finds the service I created. Right, so here's my service. I really like the search bar, it's very handy. So click on the service we just created or you just created. It'll bring you to this sort of landing page and you click on get started to open Watson Studio. Now, one caveat that the screenshots in the PDF might look a little bit different. They just changed the look and feel in the UI and the UX experience or mostly UI yesterday or a day before. So the screenshots will look different but you should still be able to follow. Again, this is my landing page, it's showing me how many projects I've created. Everything in Watson Studio runs off of a project. So the first thing you want to do is create a new project and you'll see, let me increase the font a little bit. So you'll see create a project link here. So I'll use that to create my first project. You can see I already have some that I've been using the service. You probably will not see anything there. So click on that to create a new project. You get two options. You can either create an empty project or you can import a project. For us today, let's create an empty project. I'll click on the first link and I can give this project a name. So I'll say data science workshop day two. You see here, it's already picked a cloud object store for me. If you don't have a cloud object store service created, it'll give you an option to create one here. It's just easier to create it even before we come into Watson Studio. But if you see an add service button here, not a big deal, you can use that. It'll open a new tab, create a new service, create a new cloud object storage service and bring you back here. That's it, I just give my project a name and I click on create. Again, please let us know if you have any questions or if you're stuck with anything. I don't see much in the chat, so I'm guessing people are doing well. By the way, if you are doing well, also let us know. Okay, so now I'm in my project page. All good, thanks Martin. So now I'm in my project page and you'll see there is an overview page. This again tells me all of the resources I've used so far. Obviously, I haven't done much, so this is empty. For us today, assets is the most important page. This is where all of my assets live. There are different kinds of assets you can add. So data assets, as the name suggests, is any data files I can add here. This directly goes into the cloud object store. If you see on the right here, this is another way to load files. Anything I load here will show up in my data assets. Great, what other things can I add, right? So if I go here to, if you missed that, I clicked on add to project. You can see I can add a whole bunch of other things into Watson Studio. For us today, we are adding a notebook. Okay, so I'll click on that. And in your free time, go through some of these other services. They're pretty cool to play around with. For now, I'll click on notebook to add a notebook as an asset in my project in Watson Studio. There are a couple of ways to add a notebook. I can start off, I can start out. There's a question in chat. I'll come back to that in a second, Armin. You can start off with a blank notebook. So if I just type in a name and start off with a blank notebook, or I can upload a file from my computer. For us today, we'll do option number three, which is upload the notebook that Neeraj has created for us to experiment with. There's also an option to select a runtime. This is where you can select different capacities for your, you know, for your CPU, for your memory, et cetera. And you can select different runtimes. So I can work with R or Python. I can work with Spark and Scala, blah, blah, blah, right? So for us, please pick the 3.6 excess option. That's the free option. And it's good enough for us to actually do the exercise. So I'll select that. The URL Neeraj has pasted in the chat for us. So I'll take that URL, I'll put it here. This URL, by the way, if I open it in the new tab, it's just a notebook, right? It's a very special notebook because Neeraj created it for us. But still it is, you know, it's a notebook. So we'll import this into Watson Studio. So I've put in my notebook URL, I've selected my runtime. Let's give this notebook a name. So this is NPS score, day two, or you can name it whatever you want. And that's it, I'll click on create. This will spin up a new runtime for me, the Python 3.6 runtime, and start importing the notebook. Or not, okay, let's do this again. So if you get this, not to worry, try refreshing the page that will sometimes fix it, or you can go back and try it again. So let's go back and try this again. So we'll click on new notebook from URL. And this time I'm going to actually get the raw, just in case. I don't think that usually is a problem. I'm going to get the raw link and put that in, okay, just in case. All right, so this is my NPS notebook. Raw, let's see if this works. You know what they say about the demo gods. I mean, you need them the most, they're not around. See if this works. Okay, so obviously something is wrong with my Watson Studio project today. Are people able to get the notebook up and running? If you can get a show of hands. Now there was a question, Arvind asked a question, create is great out for me on machine learning service. Arvind, most likely you already have a machine learning service created and that's why it's great out. So in that case, if you go to, if you go to, you know, down here, back to resources right here, resource list, if you go there and see, you might already have a machine learning service created. So just create the Watson Studio. All right, so people are able to get the notebook up and running, which is very good news. I can try one more time and see what happens. Or yeah, let's go back. I'm going to create a new project just to be safe. Good and empty project. I'm going to call this NPS new. Bob is saying running a single screen. I find it impractical to mimic at the present rate. I'll listen now and watch and do later. That's fine, Bob, I understand. Okay, the steps we have taken so far are also in the PDF, by the way, if you want to follow that. All right, let's give this another try. Add to project notebook and go back here. Let's grab this file. By the way, if this doesn't work, I'm just going to download, I'm just going to upload. I'm just going to save this locally and upload. Okay, because that's the other way I can do it. If that doesn't work either, then I'm going to try a different account or we'll just go off onto Neeraj's account. And Neeraj, you can walk us through the notebook. That's obviously, there's something going on with my account, all right? So NPS rules, maybe that will work. Okay, it's looking better. It didn't crash me, so this should work. Also, I had it loading in another account, so you can use either one. It's good, it gives you a little bit more time to make sure it's loading for you. Okay, how's everybody else doing? Can I get some more responses in the chat? Are you able to load the notebook? Is it working for you? All good, I'm glad to hear. Chew with that, never mind. All right, at this point, Neeraj, I think a lot of people are loading their notebooks and it's working fine. Do you want to do a little bit of a break so I can fix this or how do you want to continue? Sure, I mean, yeah. I think it's just like a short 10 minutes break, but I think you're going to, or like a five minutes break, and we're going to use that just all of you to get up to speed. And I just want to say that later never comes after these workshops. So I mean, yeah, we're happy to go slow and make you follow along as much as we can given the time we have. But yeah, maybe we're going to use next five minutes to just see if you have any errors, address them and then we'll move on to next steps, yeah. Okay, I'm going to stop sharing so I can work on this a little bit real quick. So, okay, and I can share my screen if you want to move the presenter part in WebEx. Yes. So that's great. Subramaniam, I see that you are able to get through this one, it's awesome. There's a request in the chat for a link to the PDF. They didn't get it in the mail, so. So I just posted it actually a GitHub link out here and this is how it looks like. So just maybe to walk you through the files here. So this is the notebook that I just shared the link and this is the presentation we are using right now. And then the instructions, whichever pre-instructions is actually this PDF document on how to set it up, the steps, subcard just walked us through and then this is the data set we're going to be using and then that's one and then, yeah, these are all the files and some quality reports we're going to be looking at together. So this is the GitHub page and I can post this again in the chat just so that you all have it. Yep, I just sent it now again. So can we get few yeses on, how many of you have been able to get the notebook open like this one? I saw few yeses. I think, yeah, Martin's working. Okay. Honorlingum's working. Okay, which URL I need to use? So which, what are you? Yeah, if it's not even stuck at 99% just refresh the browser something. It just works. I mean, yeah. Okay, Jeff is set up. Good. I already created a project where the people now added the second project and it fails to upload. Do I need to leave the, no, I mean, yeah, you can have as many projects as you want. So that's not at all a problem. And, okay, notebook URL. If you scroll up the chat, I can again, I think, yeah, let me post it again if you want. One second. So when it comes to URL, this is the one you're going to use it. That for, okay, awesome, Marianne. Okay. But nothing else. I like that. Honorlingum, I think I just sent you the link in the group chat. I hope you have it. They try and make an attempt. Okay, thank you, David. Cool. Any other, can we get a few more yeses? If you're, I was able to create a notebook from the URL, awesome, Irvind. Okay, can we get a few more yeses? Just want to make sure that you're following along. And I know, I mean, I have attended multiple workshop and when it says that I'll try later, I've honestly never, ever tried that. That's been my experience. Not sure how about yours. But yeah, let's see how many, you know, a few you can, but anyways, this is recorded. The material is in GitHub. I mean, if that's more comfortable to you, feel free to do that also. That's all fine with us. We'll wait for another, I mean, three, four minutes and then we'll kind of get, get on to the next steps. Marine, is that fine? I think that's fine. Sure. There. Okay. Okay. So Vishal, okay, 99% but I, if you refresh the page, was able to get it to load. Okay, awesome. Yeah, that's great Vishal. Thanks for sharing. Yeah. And I think some of you, if you have been just typing on the panelists, maybe you can, and if you want others to also see what you're coming along errors, how we are solving it because it's a group workshop. So you can also feel free to change the dropdown from two panelists to all participants. If you think your messages may help others. And if it's a Q and A, a very targeted question, then you can just put it in the Q and A, just to remind about that. So anyone else stuck or been getting any errors or you've been able to follow along? Yep. Waiting for some more yeses so that we can keep going another minute. And then, okay, Arvind got through. I can see that. Honorlingum got through. Marianne got through. Joanne, are you able to get through? Is it working for you now? I see it was stuck for you at 99%. Is it working for you now? 58%, okay. Yeah, sometime it could also be potential like a browser issues. So if you wanna maybe use the incognito mode or the private mode, sometime that helps out if, okay. So if I use a URL, it is not allowing to create. Okay, so what error are you getting Honorlingum? Yeah, I think incognito mode or the private mode in browser sometime may help. So you maybe wanna switch to that sometime. So that is another thing you can attempt. Okay, but what error are you getting? Like, are you getting any error when you paste that link and click on create? Oh, why is that? I mean, it worked for many others. Do you have any spaces and any links? Do you put like this? Okay, not available. Wow, okay, that's interesting. Now you're going to the URL and, okay. So, okay, so let's see one last time. I'll just try one last time if there's two C. One is to the left. So I clicked on add a project. Clicked on the notebook. And then we click on from URL. Then it's the link here. And then you type the, when you type the name, see it's all, it's created not there, but you start typing the name, then the credit gets added. You see that? So I don't know if you're right to return the name. Okay, okay, incognito mode made things work. Okay, awesome. That's a good thing to know. So maybe try that. Could go to incognito if you're working on Chrome. I think it's a new incognito window. That's how you go to the incognito mode or a private window. Yeah, so thanks Martin for just making the name. Yeah, see it got activated and then hopefully it should open. Okay guys, I think we'll keep going. There's a lot to cover. And I hope you've been able to follow. And if not, I mean, we really want to try our best. And I had to re-log in. Okay. Okay, so let's keep going with this. And okay, oh yeah, that's good to know. Thank you very much. Okay, awesome. So I think yeah, so now with this, I think if you have followed along us so far, I think you all should, I think be proud of yourself. But I think now, and I think you can say that two things you have accomplished so far. Joanne, you're good. Okay, very happy to learn about that. So there are two things that you all can be proud of by now you have followed us along. Is like you can, you have set up science environment on IBM cloud. That's one thing you all have accomplished. You also got to learn about the roadmap to building a machine learning system. So when I say roadmap, I'm referring to this slide because I mean, this is a very powerful slide. Honestly, I mean, if you're doing any of the data science projects or machine learning or AI projects, you can pretty much reuse this. And this is a great framework to kind of, you know, keep it handy to be able to share your project updates and see the journey of how it goes from start to end and as we move ahead. Now, and slowly we'll start transitioning to the notebooks and start running the code together. But before we do that, I mean, the one thing that we just all want to kind of as the infrastructure is set up now. So I wanted to brief on the problem statement we are solving. And in this case scenario, we had our CIO who came up and he talked about that, hey guys, we've been having issues with our client satisfaction lately and how do we wanna, and I really want ideas from all of you to improve it. And this is a boardroom conversation happening. And then the CMO brought up the point, he's like, we all know that customer gives their money and fan gives their heart. So we need more fans and not just the customers. And then CFO, you know, wore his hat and he was like, you know, 2% increase in customer retention has the same effect as decreasing cost by 10%. So we really need to be able to retain more customers and then CFO brought a point that he's like, okay, the search and our numbers that we have been observing, it indicates that 44% of the experience are really bland with our customers. And then the CDO chief tier officer was like, okay, I mean, how about we use net promoter score because that's a real globally accepted measure of customer satisfaction then and how far they can go and reference us. So maybe you use net promoter score and that's when the CIO came and his was, okay, how about go back and do a proof of concept? And this whole workshop, he said, you know, okay, I am giving you a text to all of you, go ahead, build a proof of concept to predict the net promoter score and show the results to me. And now in this workshop, we have to prep a proof of concept together and then go back to the CIO and present our whole model and the observations. And that's the journey we are going through. So we will be going from scratch to the end on even deploying the model and sharing the results with the CIO in this whole project. And I just wanted to maybe ask a few yeses that are you all able to follow the problem statement that we are working towards today? The one we'll be solving and we are working towards? Are you all able to follow me? Okay, awesome. Great, thanks. Thank you very much. So now I think so before even we get started and you see I think these breadcrumbs are very important. And I think this is where you know that for each of some of these phase we have combined and we'll have the supporting code that will run together but it also talks about the business perspective and then we'll go back to code and show you the coding perspective. So now if you see when we went on to this project the first thing is just to understand at a high level that what problem are we solving and we have to assess the situation. And that's one of the again a segment and the framework that I shared with you. So for us when we had to assess the situation we started looking at the historical data like in year 2019, the company that we are supporting for our CIO they had 500,000 cases. And then the net promoter response rate was 15% because the response is like it's a choice for customers to respond. So only 15% respond. And then 60% of the cases were non promoter and 40% were promoter. So NPS in terminology is like a net promoter score has three broadly used kind of terms which which comes with detractors, passive and promoters. And customer gets a survey, you know how likely are you to recommend us for our support or services. And it's a very widely accepted program. And then it's like zero to six is a detractor, seven to eight is a passive and nine and 10 is a promoter. And for this project we are taking it as a binary class problem. So we are gonna be terming detractor and passive as a non promoter and nine and 10 as a promoter. So that's how some of the background in assessing the situation is about. Then moving on to the methodology like so what, you know how it really works. I mean, you have to understand first how the calculation works. So there is a calculator and you'll find all these links which are very thorough. I mean, that really is a real calculator. So you can see, you can, this is the kind of question that goes out on a scale of one to 10 how likely are to recommend the company to a friend or colleague. And you can see generally, you can put the numbers and then you can see the NPS is actually percentage of detractors minus percentage of promoters and the scale of an NPS is a minus 100 to plus 100. Anything above zero is considered like okay and an industry standard would be around 60s and 70s. Generally would be the industry industry benchmarks where people would always aim to be. So this is how the net promoter, I mean, as a programmer or calculator works. And then we went back. We were like, okay, we assess the situation. We understood how NPS works. Now it's about, you know, are there any benchmarks for us to look at because we always need a poster child. I mean, somebody, just to see that are there any references out there given my industry, given my area, has somebody already accomplished that? And if somebody has, then why I can't? Maybe there is something I need to do but having a poster child is always very important to get some references. So this is what we looked at. We looked at by the industry. We looked at by the leaders. So Apple, Southwest, Costco, been doing quite amazing on their net promoter score. And you can see that also talks about like how broadly this use, this program has been used in the industry. And so that was about the benchmark. And so what are we going to do now? We got like pretty much half through the business understanding. So now we are getting on to our CIO ask that, okay, we are putting a goal statement. We want to improve the net promoter score by identifying the potential non promoters ahead of time and proactively address customer issues. And this context, we are getting cases of customers like in service now in sales for the ticketing systems. We get cases of the customers. And while they are open, we want to run predictions on them and see the customers who are not having a good experience can we proactively work on those cases and get their experience back on track. And for this, we are gonna be, we are looking at the NPS historical data and using machine learning and the AI aspect that I'll talk about. And that's the business of this. Now, I think like we said, it's a hands-on workshop. I mean, we just, all would want one thing is that by the time you go back from this workshop, at least, you know, all of you, some of you should have your own project charter with you that you can take back and start working on it as a project. And what would make us really proud of all of you would be that given in the next few months, if you say that we got some inspiration from this workshop and we kicked off some of our projects taking some of these frameworks and keeping them in mind. So with this, we have put together some questionnaires and as an exercise because it's a group workshop. So this is how the first exercise is about is like, we would like you to think about in your context and I'm sure we all had a problem statement in mind when we thought of joining up joining this workshop. So first thing we are asking is that as a role, as a role, like your role is CFO, CEO, programmer, developer, data scientist, as a role, I would like to direction is improve, reduce or increase or decrease the target variable. The target variable could be fraud, risk, customer satisfaction, volume or any other problem you can think of for what sections of the interest or business you have, you know, and then buy what amount and in what timeframe. So if you all maybe want to take a few minutes and maybe just come up with your statement using this framework and share it in the chat because what it would really allow everyone is that if you share these statements and if you find somebody focusing on a similar problem, you can find the names out here and you can maybe reach out to each other on LinkedIn and start maybe collaborating on some of these works if that helps you out. And also it's an opportunity for everyone to learn what are the different classes of problem which can be looked at. So that's the whole goal of this exercise. And so I'll take like another few minutes and we'll look forward to your inputs on the chat and let me know if you have any questions on this guidelines here. And Maureen or Par, you wanna mention anything that we missed out on this one? I think you covered it really well. This is just one sentence that you end up writing but it is a very powerful one in terms of scoping a project and really thinking through. So this template is something that I definitely recommend you keep in mind as you're thinking about projects that you might wanna do in the future and give some credit here to George Stark who brought this to us. He's an IBM Distinguished Engineer who has been instrumental in helping with the data science profession as certification that we did with the open group. So I think we'll just let people think for a couple of minutes and see if they can come up with there something that they wanna think about as they're going through this example. Okay, Arvind, I just shared that whole template here on the chat. I hope that helps. So we got the first one. Okay, I hope to increase the attendee satisfaction and participation for our virtual revenge by 30% and three months. I love it. Okay, we'll give it just another minute and then we'll keep progressing. Just seeing if anyone has any thoughts or you've been able to make your statements. And I mean, the whole reason is I think you saw how we assess the situation, how we looked at the methodology, how we benchmark and also about setting up the goals and that's what we all are trying to make together. Okay, that's great. Jivitha, as an analyst, I would like to reduce risk for at least 99% in a month. Okay, that's great. Okay, as an operations manager, I would like to minimize the OPEX. OPEX for resource management by 10% year over year. That's great. Okay. Okay, as a geoscientist, I would like to reduce the number of reports that are available but overlooked by 100% within a week. Oh, wow, that's aggressive by introducing a systematic and preferable computer-assisted way to label and tag them. Wow, that's aggressive. But I think I'm sure it's doable. Yeah, so these are, I mean, yeah, these are some great statements. And like Maureen said, these are very powerful statements because any project would actually start with this one statement. And I've been like, you know, I've been like in a, what do you say? Like it's like a panelist or in like a conference, one-on-one sessions, and a lot of people I've spoken with, they always ask me one thing. I mean, I've learned a lot in data science, but I don't have a project. And this is where the project starts, honestly. If you're able to come together with this one statement, you have your own project to start taking up and, you know, moving on. So that's just one thing. So Maureen, you were saying something? We've had a couple of more great additions in the chat there too. Yeah, okay. So yeah. It's a single sentence, but it's really powerful. Keep that in mind as we're going through the, you know, the rest of the workshop because that's our goal for you is to be able to, you know, have you look at a problem that you want to tackle and then be able to apply these things to that. So we'll kind of do that in the other exercises. Okay. Yeah, I think we have great. Yeah, absolutely, you're right. And I think we have quite a few now. So I think with this, we'll move on. Like I think last few are Irwin. As a product owner, I would like to increase the effort for product development by 30% in six weeks. As an underwriter, I would like to reduce the claim percentage for a single risk business by 5% in two years. That's Maureen. That's great guys. Thank you. Thank you very much. And last one as a lead, I would like to increase the productivity of resources by 25% by in three months. Awesome. So with this, I think, like what we have accomplished so far if you see with this exercise, if I would say that you at least have a framework where you know how we assess the situation, we understood the methodology, how we looked at the benchmarks and finally define, you were able to define your old business objectives that you would like to work on post the workshop is the what we have accomplished so far. So congratulations all of you. Now we are moving on to next step with this. And I think now the next step comes, you all have your business problems in your mind now. Now, what is a data set we are going to consider and problem, you know, and why I mean, I'm going to be taking two sides of the situation all the time is like, okay, we solved it this way in our project and now how are you going to solve it? I mean, and that's all the workshop and the exercises about. So now in our case of NPS prediction or customer experience prediction, we actually started speaking with all our business, business point of contact, the data team and we realized that there are a lot of features or the data variables we can factor in and start using them in our model. So we started with things like time, there were a lot of time-based features like when customers open a case for us, what day of the week they reach out, do they reach out to us in odd hours like midnight or the morning or evening or sometime what is the age of the account? How long are they with us? How many meaningful updates are there? How many two and fro conversations are there? Could be based on time and then where are they calling us? Is just geography has any role to play? How much are they spending with us? What's the lifetime spend, the monthly recurring spend? And then the AI part comes in where we actually consume the logs of what the customer types in and we calculate the sentiment and emotions and build a score out of that. So that's the AI part here because NLP is again a part of AI and then there were many other features we look at assignment count, support plan account type and there are many, many severity when they initiated a case or customer initiated a case what severity do they do they submit it with? Like is it a severity can go from one to four and I'll show you some examples. But now the point here is that we work on data and that's where the 80% of the time goes in all these such projects. We got the data, what next? We have to look at the quality of the data and that's the key step out here and see that is the data reliable because it's always like garbage and garbage out. I mean, if you don't have the reliable data and you produce any results, they are of no use. So honestly, when we get the data the next very thing is we look at the data and we look for some basic statistics and also some, you know, another aspects like we would look for number of observations, features, types, large number of distinct values, high cardinality, correlation, missing value and the end goal is just can you trust your data? And with this, I think we'll move on to the lab and that's when we're gonna be trying and there are like, sometimes it can be very hard with so much data being around, are there any tools and packages available to perform or look at the data quality? And I think I see one question. I see some bias, pandas profiling. So there are a few packages available which are pretty good. I mean, they're really good and there's one pandas profiling, another one is the EDA. So we will use both of them and I'll show you in the notebook and I think so at this time, let's switch on to our notebooks if you have it open. And so this is the one you have it in your notebook and if you all have it open, I'm also gonna refresh it. So the reason is, I mean, there are some easy to use packages then you're writing 100 lines of code just to generate like, you know, 10, 15 or 20 charts. There are some predefined packages like pandas profiling or the EDA. The other one that we are gonna use here and they make our life very simple with just one line of code or two line of code and that gives you the similar output that you anyway is gonna write in 100 lines of code. So that's the reason we would use some of these packages here. Does it help, Saroni? Okay, cool. So once you go here and let me share my screen. So first thing we're gonna be attempting here is like if you see here, so this is where we have all the details and about the notebook and about the project. And then the next thing we're gonna be looking at here is there's a whole index out here, few things. Again, we never assumed that you even ever worked on notebooks. So just wanted to mention that there are two different types of cell. I mean the key ones, one is a code cell another is a markdown cell. So anywhere you're seeing the comment they are the markdown cells and anywhere you see the code down below they are the code cells out here. So you can write the code and the markdown cells are generally used to give some instructions or explain something. That's, this is generally how the standards have been followed nowadays. So first thing you'll have to do is if you can click on the link where you have all of these packages and then let me see if I can also open one and make it run one second. I'll also open it in a cognitive mode and just see if it works for me. Okay, hopefully it works. But I have one running notebooks I would still be able to share all the results with you. So that's not a problem at all here. But I was hoping if I could just run it with you. Any other questions so far? Okay, so I'll keep trying. I think because some of you have been able to already work on it and launch it then what we'll do I'll keep a check on this one. And the time it, so I'll kind of revisit this. Yeah, go ahead. Neeraj, I was just gonna say there I've found out they're having some network issues in the Dallas region. So that's why if anybody's having, yeah, if anybody's having similar problems like instead of just not doing the workshop you can open or you can create services in a different region than it should work. So just FII, but yeah. Sure, yeah, no worries. Then I'll show the results maybe I think over here. So what, is it working for you, Kar? Are you in a, meanwhile can you create a different region or something? Yeah, I created in London and everything's working. Okay, so do you wanna show these runs maybe then in that case, then that way people can follow how they click on run and everything. Sure, I'll open it and you can narrate and I'll just have the screen open. Yeah, sure. Anything I have one question. So which is like, I must admit some stuff. I have the NPS prediction notebook that gives congratulations. Oh, okay. I think you opened the Jeff incorrect notebook. So use the link that I provided above that was a test notebook from the workshop. So create a new notebook with this link and that should open the one we are using right now. I think you use the older one. Okay, thanks, Abkhaz. You already shared it. So do you wanna share your screen, Abkhaz? I'll stop sharing. And you have to make me present it. Yeah, yeah. I'm just doing that. Make me present it. Yes, okay. I did that. Okay, so I do have the notebook loaded. Again, sorry for the interruption before but I do have it working now. Meters G1. Yeah, and I- You can go ahead. Yeah, honestly, that's been a generally a data science life phase and even the, I mean, sometimes the tools work, sometimes there's a problem and that's what is about finding out the hacks. So this is one of the hacks is working in a car system. So I'm like, okay, let's use that and maybe be able to show you that. So one of the things, the first step we would like you to try is there is all the packages there. So, and there is a run button on the top. So you can just select the cell and click on run on the top. So that will install all the predefined packages that you're gonna be using for this workshop. And the second thing is you can also check the versions with the second cell out there. And the reason we are doing that because some of you later may wanna use different IDEs like PyCharm or Google Collab or any other products out there. So yeah, so this is just to make sure if you get any errors you can come back and see these are the versions of packages ran well. So you can maybe switch to them. So that's the reason just to have the version check. And then let's keep going next. Yeah, and that's when you kind of it comes to data exploration and there are two options I put up. Now the only thing between there are like two packages like, you know, data prep EDA and they were surprised to learn like we figured out like Pandas profile took very long because this doesn't have parallel processing. So that takes around 20 to 30 minutes to run. So we are not gonna attend that here but I'm gonna show you the results still but data prep.EDA is actually from SFU that's our scholars that we realized yesterday. And so that was something exciting we found and it's really fast because it uses parallel processing. So some of the things see it really highlights here is that just with one line of code you can pretty much audit the data and things you will look at you can look at like show stats info like if you click on that you can find out how many variables are we talking about here? So there are 58 variables in the data and you can find 3,900 observations how many are categorical versus the numerical you know, out here. Okay, I'm unable to see the file edit view options bar. Okay. And where exactly are you referring Subramanian file edit view options. Okay. I think it might be fun but Okay. I'm not sure what they're trying to do if you can tell us what you're trying to do. Yeah, so yeah sure. So then I think yeah, so you see there's so much information but just one line of code you could figure out and that's the reason we're using these packages and now the next thing is about auditing is you can go back and see the distribution of these data points. Like for example, how many are like zeros and ones zeros are generally that non promoters and ones are promoters so you can see the distribution. It's evenly distributed data like, you know, 60 40 not that far and then you can see by dates then if you go further down you can also find out missing values. Like, you know, some of those would have also tell you that okay, like meaningful communication count 1.5% values are missing out there and you can find a lot more and just to inspect the data if it is as per your expectation or not and further if you go down there's another just one line of code to be able to check for the missing values if you look at that and that's that's a little ugly chart is not I mean the one I would like but I would say it still makes you with a good pointer like this tells you that all the oranges are the ones which are missing values and it gives you a pointer if you kind of hover your arrow you'll find out that which are the ones where the value is missing and then after that once you understood which variable has a value missing you can just write maybe a next line of code like I saw that I mean sentiment overall had a missing values so I can look at that and maybe go by you know ticket open date and maybe for each of the variable I have all 57 and then see is there any one date where all the values are missing or maybe there are some other patterns that I'm observing out here and once you do that then you can also look at the correlation aspect that also like tells you about is there any correlation between the variables and similarly when you and again they are very small and I think that's where I like pandas profiling a lot because the views are much better but this is like runs within a minute or two and that's why it's good for the workshop purpose but I mean given you find some of the deep reds out here so you can go for the town like in the next cell that we have and you can see that for example sentiment overall is highly coordinated with the last three conversations sentiment we calculated and even the last conversation so we logically we may not need all three we just have to keep fun from the dimensionality reduction principles so there's a lot with just one line of course you could accomplish and the only reason we are suggesting you should use that and after the workshop you could also use like pandas profiling and that has some great views that I have in the slides and I can share that back some of that Is that the final year? Yeah, so yeah, for sure Yeah I think it's in the even if you open the slides that will be there in the presentation Yeah, if you have the PPT open Yeah, so if you go to some of the hidden slides Yeah, yeah, keep going Yeah, okay, awesome Yeah, we did this, this, this Yeah, so this is all Yeah, so this one if you see Yeah, this is the data prep idea the last slide and this is how the pandas profiling views would look like so very neat and clean I would say just the time is of you know, essence out there and if you go to the next one you can see Yeah, I think you can see the some of the alarms in the data it talks about there are 114 distinct values in catalog names so there's a high cardinality there then there is a correlation between disgust description and anger description variable that we generated so, okay so I think, yeah so this tells you about that and next one if you go to that talks about missing data that tells you gives a very nice visual representation of where your data is missing and that's where you can start building upon it and that hopefully that helps you out and I think the next one I think that's pretty much right and then I think with this I mean, Chetanya the question I mean, the question that you have is what is pandas profiling? It's a package which is a pre-written code as a package that you can use and that allows you to use just maybe a single line of command and perform the larger action of charts that you were able to produce so the pandas profiling is a package the EDA is a package the other one we were sharing so I hope that answers your question and with this I think we wanna go back to the exercise again that is around the data like now given you saw the journey we had around constructing our data what kind of data set did we factor in to start to see whether they're important towards the target variable or not now the question is what data set would you gather for the work on the problem statement that you just mentioned in the prior one and if you wanna make your notes if you wanna maybe share it on the chat feel free but just wanna give another two, three minutes where you can sum up your thoughts and we want you to follow along that okay, in the first exercise you wrote your problem statement in second exercise we would like to ask you can you think about what data sets are you gonna gather in the problem statement that's gonna help you again move on to the level like we are doing here so we'll take it back here like another two, three minutes and see what you have and then keep making progress guys are you able to follow so far I mean can we get few yeses or you have any other questions that we could answer is our pace okay are we going with the right speed you're able to follow us awesome yeah and that's the only way we can communicate with you is that so yeah really happy to kind of keep hearing from you on chat so okay one question which phase do we identify features so it's the data understanding phase I would say basically first you understand like if you see the breadcrumbs on the top first we start do the business understanding where we formulate the problem and that's what the one line goal statement you wrote the second is a data understanding like and even I mean it can there can be an overlap in business understanding and data understanding and data prep but I think majorly most of the time data understanding phase is a phase which is focused on understanding the data what data you're gonna gather what features input variables is the phase is the right one for this one supermanium I hope that answers your question awesome okay so that's pretty much and then we'll wait for another minute or two and then keep moving on to the next phase which is data prep and make you then run the code and then hopefully we will be able to get to to the end of deployment where you will be able to deploy the code in the production and see the results and make predictions yourself so that's what we are hitting two words from here on so with this I think okay why whether problem is used rather than NPS problem okay why whether problem is used rather than NPS problem data set okay I've been using NPS data set and it's an NPS problem whether problem was just an example in the beginning to give a broader perspective on I mean how social media data set can play an important role how the weather data can play an important role but the but I think the current focus is on NPS customer experience problem so our data set is around that right now that was just an example in the beginning to give us show us a bigger picture okay with this I think over to you Kar I mean when we're ready we can move on and maybe summarize just if you wanna maybe move on to takeaways so guys I think yeah another congratulations to all of you with this I think you were yourself been able to open the notebook set it up load the packages verify the versions you were able to explore the data set and perform quality audits so congratulations for that and with this I think we'll move on to our next phase which is data preparation and Anupkar it's over to you from here great thank you all right so I think Neeraj covered some of these techniques previously so I won't spend too much time here but one of the things you generally have to do then before creating these models as creating your machine learning model is data preparation so transform the data in a way that the model can understand and easily handle right so as you can imagine machine learning models do very well with numbers and in our data sets especially in the NPS data set you saw as Neeraj was explaining using the profiling technique there were a lot of columns that were categorical or cardinal in nature and so they were you know they had text some sort of text in them so step number one here in feature extraction is generally to convert those text variables or columns into some sort of number representation and one of the techniques used is one hot encoding and this essentially takes all of the values or categories in your columns and converts them into their own columns and then puts a zero where it's missing and one where it's present so when I show you an example it'll be a lot clearer, more clear so the second, I guess the second step after one hot encoding you'll see in the code we are doing is normalizing or scaling our variables so in an example I give is with a model that's predicting a house price your square footage variable versus your number of bedrooms variable has different units, right? So bedrooms may go from one to three square footage may go from hundreds to thousands of square feet and so depending on which model you're using some of the models because the units and scales are so different they may favor one variable over another. In our NPS model that our data set we are looking at the revenue, the units for revenue is a lot different than you know the time duration the call was or the time the ticket is open for those are very different scales and so one of the techniques for normalizing that data set or your columns is called min max scalar and that essentially brings everything down from whatever the minimum and maximum is for a column to a predefined min and max. In our case we are transforming all of our columns from whatever they are to between zero and one and this ensures that all of the columns are sort of start off on a level playing field when selecting which column to putting weights to these different columns in your machine learning model. So that's feature scaling and finally there's something called feature selection. So in our data set there's 59 columns but you'll see after doing one hot encoding we go up to like 279, 280 different features or columns. And so that still it's not a huge number of features but still we should be able to use some techniques to reduce that further. That not only helps improve the accuracy perhaps of our model but also it makes the model creation process and the training process more effective. One of the ways to do that is to look at the missing values. Again, Neeraj went over this in the data profiling section but if majority of your observations for a given column are missing or not there then the question you wanna ask is what is that providing to explain our target variable if a lot of data is missing? So one technique is just look at what are the missing percentages for each of your columns and perhaps get rid of the columns that have a lot of missing data. The second technique is the amount of variation and so this is where we look at correlation between our features and so if two features are very closely correlated then we can perhaps remove one of them because they provide the same information to the model about our target variable. You can also correlate your each of your features with your target variable as well and that's what we do in the notebook. Okay, I think it's probably a good place to show you. Well, before we do that Neeraj, do you wanna do the next quiz first real quick? Yeah, you wanna put a slideshow mode? Yeah, give me one second, I have to swap my screens. Real quick, I'm not sure what happened. It's just everything's blanked out. Let me try it one more time. Having multiple screens makes life easier, right? But sometimes it doesn't. Yeah, that's okay. Yeah, so we can just go to that. That's okay, you can just, yeah. Next, okay. I think this is just a teaser, I think. I know we've been like, and I thought about it honestly multiple times and I never knew about it. And I have to say that loud here. I think it's always happened that I know from my childhood, I've always seen using ABC and X and Y as the most commonly used mathematical notations and placeholders and just out of curiosity, I thought maybe if anyone wants to guess, like, you know, why do you think that's the case? I mean, what's the origin of that? Any guesses on the chat that anyone wants to make? Why always ABC and the equations and placeholders like in X and Y? Okay, okay, that's close. Yeah, anyone else? Also, Niraj, I just noticed this is slide number 42, which is of course the meaning of life. So, good placement. Yeah, so let's go on. I think next card to show the results. I think, yeah, so this is the reason. I mean, yeah, it came from La Geometry. I believe that's the way we pronounce and it was a publishing in 91637. And that's when it was published that, you know, he mentioned about that, you know, the letter at the end of alphabet X, Y, and Z are to denote unknown variables and while at the start ABC are to denote the constants. So that's where the origin of, you know, X, Y, Z and ABC came. And that's why throughout our schooling, we have been seeing these, you know, as a mathematical placeholder. So something interesting, it goes way back to 1637 and another teaser or another one that we, I always never thought about it. Honestly, I thought it might be interesting to all of you was if you have seen when you do train and test the model and whenever you have the input variables or feature variables, you would always term it as capital X and whenever there is a target variable, you will always term it as lower Y. Y is the upper case X and lower case Y. I mean, any thoughts about that? Like when you train and test, like the train dataset would be, we would always say that as upper case X underscore train and the target variable Y would be like Y underscore train, you know, which is a lower case. So Y X caps is like upper case and Y Y and lower case guesses on the chat. Anyone? Okay. I think just wanted to check on Erlingham. I hope you've still been with us, been able to follow Joanne. I hope you're still with us, taking with us. Vishal, I see you were able to create the notebooks for sure, Marianne, John, I think. Yeah, I hope you're still following along with us. Yeah. Okay. Yeah, sure. Okay, so let's show the answers. Okay, so it's again comes from linear algebra. So when you look at capital X, it's basically, it's a matrix, because if you look at the input variables, generally we are looking at a matrix and when we are looking at the target variable, it's always a vector. So that's why the vector is always written in lower case and then the matrix is written in the upper case and that's where it brings upper case X and lower case Y and with this, thanks for attempting and with this back to you, Upka. All right. So going back to our lab real quick then, we'll look at these three topics we just talked about and I will not spend too much time here, but let's, oh, I'm all the way down here. Let me go back up. All right, so we finished the profiling. So if you go to the section called feature extraction, you should be able to run these cells. I'm just outputting NPS as our data frame. I'm just outputting this cell, maybe missing your notebook by the way. I'm just outputting the first five rows of the dataset. The next thing we're doing here is we are identifying all of our numerical columns, which are all of these. And then we are identifying our categorical columns, which are these, the list of columns here. And then finally, we are identifying columns with very high cardinality, which is the set of five. And then this next line here, we are using the pandas library to concatenate these different data frames. So we are looking in the NPS data frame with only the numerical columns, okay? So that's number one. Number two is we are using this get dummies function to create our one-hot encoding for the categorical dummy columns, which is the second set here. And then finally, we are using, you'll see down here, actually down here, we are applying the hash function to get the hash for the values in the highly cardinal columns, which are these five here. So let's run this. So again, there's a couple of ways to run it. You can click on the run icon or you can do shift enter in order to run. So I'm going to click on shift enter. So you see the number changed from 16 to like 23. That's how we know the cell ran. And I'll click it one more time. And there we go, the cell was also run. Now I'm going to, I like to play around with these notebooks as I'm explaining, so I'll insert a cell above and let's see what's inside this NPS select. So remember, now we have two data frames. Our original data frame is called NPS. We created a new data frame called NPS select after performing our one-hot encoding. So let's look at the head of NPS select and you'll see, I'll try and put both side by side. It's kind of hard, but the original data frame was up here and the new one is here. And as you can see, all of the columns or all of the values in our new data frame are now numerical, right? So even the tribe level, et cetera, et cetera have all been converted into numerical values. You can see down here, technology level underscore one is also numerical value. So everything is numbers at this point. So it's easier for our model to consume. All right, so that's feature extraction section. Kukar, I just wanted to make sure that everyone is able to run these. And also if you could show them how you inserted a new cell because some of them may be very new to the notebook and they have to help them out. So just maybe a few yeses if you're able to follow these steps along or you're stuck anywhere. So to insert a new cell, I again, because I'm so used to new Jupyter notebooks, I use a shortcut key called A. So if I click A, it's inserting new cells before my current cell. And then I use B to insert after. So you can see B is going below my, so B is for below, it's below my current cell. A is for previous to my current cell. The other way to do it, if you don't remember these shortcuts, that's okay. You can go to insert and then you can insert cell above or you can insert cell below. There's also another very handy icon here, this keyboard thing. And this basically gives you all of the shortcuts, right? So over time, as you play with Jupyter notebooks, the shortcuts are actually very easy and quick to use. All right, I see somebody said in comments they're able to follow, so I'll keep going. All right, so feature scaling, this is where now we have different kinds of scales between the example I was giving between revenue and maybe time that the ticket has been opened. And we will use something called minmax scalar in order to normalize all of our data values between zero and one. Again, there are other kinds of scaling methods in pandas in the, actually let me check, let me confirm minmax scalar. All right, so it's not pandas, it's sklearn, right? So I don't want to say something wrong. So making minmax scalar, we imported it from scikit-learn. Now there are other kinds of algorithms and methods you can use to normalize your data. Some of it depends on the distribution of the data. Some of the algorithms or methods do better with if you have a lot of outliers in your data. So again, in the slides, Neeraj has provided us links to read more on this. So let's run the cells. So in this case, I am converting my NPS select. I'm extracting my target variable, which is little y and I am also making a copy of the rest of the data frame into capital X, which is my feature set. And then I'm just deleting the column likely to recommend because that's my target variable. I'll run that. And the next cell here, I'm just explicitly converting everything into numbers. Run that. And if I were to now look at just y, you'll see y is just one column, right? This is my target variable. The next cell here, I am, this is another technique that Neeraj talked about, right? So what do you do with missing values? If there are too many missing values, maybe just discard that column. Otherwise, this is one way to deal with missing values. You can use this fillNA function in pandas and you can tell it what to fill into the blank missing values. In this case, we're using the mean of the column. I'm saying for every column, use the mean of the column to fill in the missing value. And then I'm using, I can, this is redundant. I can use this next line here to create a new minmax scalar. And I'm telling it to convert everything between zero and one. And then I use this fit function to run that minmax scalar on my data set or on my data frame X, okay? And then we just display this out. And as you can see here, everything now is between zero and one, right? All of my numerical values are between zero and one and there should be no blank value anymore because everything has been filled up with the mean of that column, okay? So this step is feature scaling. And then the last thing I wanna show you is feature selection, right? So this is where we look at correlation. So in correlation, we are saying, we only wanna keep 30 features. If I do a length of X dot columns, currently we have 279 features. I'm saying we just wanna keep 30. And then we need to show written of function that calculates the correlation between X of I, which is every column in my target variable Y. And then he's doing, basically, this is just looping through and only keeping the most relevant feature set based on the correlation. And we are comparing, again, comparing each feature with the target variable and keeping the top 30. Oops, so if you get something like this, it's okay, it most likely means like I forgot to run something. In this case, I forgot to run this cells. If you run that and you run that, it should work. So here are my top 30 features. So again, we went down from 279 to 30 features and then I'm just storing them in an array, okay? So with that, let's go back to the slides real quick. So we looked at feature extraction scaling selection. If you run the cells, then you've gone through that process. Let me, is there some hidden ones here? All right, the next thing we want to, I don't know nearly if we have time to go through some of these exercises or should I carry on with the notebook? I just maybe explain then what it is and maybe hopefully we'll make some notes out of this one, just set the context, yeah, and then we can keep moving on, yeah. Right, so I mean, the exercises are, for you to discuss and discover, I guess, potential with your data set, right? So remember the first exercise was to come up with a problem statement and then we asked you about what kind of data you would like to use with that. Now here we're asking you to kind of figure out if you see any problems with that data set, right? Neeraj would probably agree with me. I've never seen a completely super nice and clean data set unless it's made up example for a contest or something, right? Unless it's a Kaggle, I mean, and that's never the real world. This is a good practice place, but yeah, I mean, the real world is different data sets than what we see in the competitions for sure, you're right. Yeah, so if you can do this later on, make a list of potential problems you can see with your data set, right? So we gave you a couple of examples, you know, a data set may be coming from a lot of legacy systems. First of all, how do you collect all that data? That might be hard. Then if you do have that data, do you foresee a lot of duplicate data coming in? Maybe a lot of mistakes that people when entering or data entry could be problematic. Or perhaps there are a lot of missing values, data that you just don't have. So just going through this exercise and making a list of problems that you may see in the future also helps you with data profiling, right? So if you already have a inclination of what might go wrong or the problems that might come later, then you can focus on those features and columns and maybe spend some more time to get that data set from an alternate source. So I think this is a worthwhile exercise for you to do. All right, so in our journey here, now we have looked at how do you extract scale and select features for your data model? Now we are ready to actually embark on our journey to create this model. And near 20 of some of this before, typically we divide our machine learning problems into supervised or unsupervised and people probably know this. The supervised version is where I have already have a labeled data set that I'm training on versus unsupervised is where my data set is not labeled. So in our problem, in our NPS data set problem, because we have this, let me go up here, because we have the target variable, this one here, likelihood to recommend, this would be a supervised learning problem because our data set is labeled, okay? And since it's supervised, then I can go into these different categories of algorithms based on what I'm trying to do here. So in our case, we are trying to predict a class, right? Our class of zero or one for the likelihood to recommend, and therefore it becomes a classification problem versus if I'm trying to predict, let's say how long a call is gonna last or let's say I'm trying to predict the price of a house or something like that, I'm predicting a continuous variable. In that case, it might be a, it generally is a regression problem. And for unsupervised branch, there are different kinds of problems you can look at. So for example, clustering is one, right? So given, if I give you a typical example I see is if I give you a whole bunch of dimensions about a coin, right? So I give you the weight of a coin, the size of a coin and the density of a coin. Can you tell me or can you cluster them into then pennies and dimes and quarters and whatever? I'm also into school in Canada. So I wanna say maybe loonies and tunies, right? So whatever that might be. So that's an example of clustering with unlabeled data set. Association is a very common one, right? So when you go on Amazon and say, oh, because you read this, you might like this. I mean, you go to Netflix, the recommendation systems or association problems. And then dimension reduction is something we actually looked at, right? We looked at very basic models for dimension reduction. We looked at just the missing values, percentage of missing values. But you can use some unsupervised learning techniques to do that as well. Okay. The example here is also really good, right? So this is make the best outfits from given clothes. So, or it could be like given a set of ingredients, what can you prepare for dinner tonight? That's another dimension reduction problem. All right, so here's an exercise for you to do then. Given all of these different problems, take a quick five to 10 seconds and just try and figure out, let's say just spam filtering. Where do you think spam filtering would go? Is it a supervised problem? Is it unsupervised? Is it, if it's supervised, is it classification or regression versus these other ones for unsupervised? Can somebody put in an answer just for spam filtering? And then we'll move on. I do see one question, how long can we use and practice in IBM cloud? So the free tier, it's yours to keep. There's the only limit of the amount you can use per month, but it doesn't expire or anything like that. So yeah, you'll have this to practice later. Unsupervised, okay. So you're saying spam filtering might be unsupervised. The way spam filtering clustering, okay. Yeah, it really depends, right? So we have identified spam filtering as a classification problem because generally, yeah, Jeff is saying classification. Yeah, generally you have some target variable. You have some pre, you know, the way this works is I have 100,000 emails from previously that some human has gone through and said, yep, this is a spam, ham, spam, ham, spam, ham. And we use that as a label data to then train our classification algorithm. So the spam filtering in that case is a supervised algorithm and classification. So thank you for taking part in that. And then I would encourage you to go through the rest of these and try and figure out why we have labeled them the way we have. And of course you can reach out to me or jar myself if you have any questions on this. All right, so how do you measure your model? So great. So you have understood what kind of model you need. How do you actually know the model is working? So if you remember in the notebook, we divided our dataset into training and testing and that's the trick, right? So we train our models on the training data. We never show the test data. We test it or we evaluate it on the test data. And there are different methods to do that. But that's the general idea. And then the metrics themselves vary depending on what kind of model or algorithm you're looking at. So if you're looking at classification, typically you see metrics like this. So accuracy is how often is the classified correct? So to get the actual metrics, you look at something called confusion matrix. And confusion matrix, it's pretty easy to understand. It's basically your predicted values as the column and then the actual as the row. And essentially you're just calculating how many times was the prediction the same as actual, both in the false and true case, and how many times wasn't. And then you can come up with these different metrics. So you can look at accuracy, you can look at precision, you can look at recall. And again, all of this is coming from the confusion matrix. It's just using different kinds of different values here or ratios of different values. And how this matters is depending on what's more important to you, you might look at one metric over another. Sometimes it's a combination of metrics. So the F1 score, for example, you can see the formula uses precision and recall. And then the notebook will go through how to create this for all of our models. And then finally what you wanna do is you just don't wanna try one model. You wanna have some sort of a benchmark model to start off it. And then you want to see how you can improve on that. So in the notebook, we have tried about a dozen different models and based on the accuracy, precision, recall, and the F1 score, then finally we pick one and deploy it. And I'll quickly show you that. Do you wanna confirm on time? Are we running over a little bit or do we still have a little bit of time? I think we have another 28 minutes. Yeah, I think we are running fine. I think we should be able to cover up this topic here. Perfect, I see some folks had to leave and so I was wondering if I'm going over. Okay, sure. Okay, so going back to the finished feature selection, we have now split our data set into training and testing. So we use this very handy method from scikit-learn called train test split. And basically you tell it what the split should be like. So in this case, we're saying 30% go to test, 70% go to training. And this random state variable is just so that we can reproduce this training and testing data set later on. If I was to omit this variable, then it will most likely give me a different training and testing data set every time. So just so that you can reproduce what we are doing, we put in that random state. Okay, so let's run this. All right, so now I have my testing data, my training data, and then I am going to define this utility function that will calculate the precision, the recall. And again, I'm using this pre-built scikit-learn function, precision recall, F score support. And you pass it, what the actual prediction is or what the actual value is, why true? And then the predicted value, what my model predicted. Okay, so let's run that. That just created the function. Now, next couple of cells are basically importing some kind of model. For example, in this case, I'm importing logistic regression model. And then I'm going to train this model on my dataset, which is what this fit thing does. So fitting is, in a sense, training on my training data. And then I'm going to use this predict method to calculate the predictions, so which I can then pass into my calculate metrics method, which is up here. If I run this, this is basically saying it tried the logistic regression on my training data and then it tested it with my test data and it's come up with this confusion matrix, right? So if I go back to the slide, that confusion matrix is basically this here, right? It's missing the column headers and the row headers, but it's just a table like this. And it's also giving me my precision, my recall and the F1 score and the accuracy. All right, so I've done this for one. So now I'm going to go through real quick and press run a couple of times so we can do it for the rest of the models. And again, this is something, it's typically done in machine learning or data science projects. You just don't want to take the first model you get. Generally, you want to create a lot of different pipelines and test different models and pick the best one that, I like to say the best one that works with your data set, right? So if I gave it a different data set, maybe a different algorithm would work better. Okay, and then this line here basically puts everything together. So we have now we've got the individual metrics. This one just takes all of the classifiers right here. C1 to C11 and it'll output the confidence scores. Sorry, it'll output the metrics as well as the confusion matrix for each one of these. So you can see the first two are quite different, right? You can see the numbers are different. So if I go back to, let me do this, let me let's move this a little bit here. Here, okay, let's do this. So you can see here, where did that go? Move up a little bit here. So my logistic regression, my true negative is 619, my false negative is 395, my false positive is 79 and my true positive is 81. Generally you want your diagonal to be very solid, right? And you want this diagonal to be, if it's a two by two, you want this diagonal to be low numbers. So again, you can compare the different models, but confusion matrix is just one thing, right? Then you want to compare these metrics which are derived from the confusion matrix. And you can see the precision and the accuracy are, it's different for different models, right? And then we have done, actually it's kind of hard to read it this way. So New York has created in the next cell, sorry, it keeps taking me all the way down. So in the next cell, we have created a graph. Oh, I thought actually graph is right here in the same cell. So this is a little bit easier to read, right? You can see the Gaussian process classifier is the most accurate for us. But again, you have to take this in context. You have to figure out which one is more important? Is accuracy more important or is precision more important in my use case? And then if I go down here, again, we have charted the different accuracies. Again, accuracy is just one of the metrics you look at. All right, so based on my metrics, now I understand how the different models are performing and I've picked my best model, right? So in this case, I think we picked logistic regression if I'm not mistaken. We picked CF10, so whatever CF10 was, we can go up here and see. So, oh, we picked gradient boosting classifier. Okay, great, so now I have my model and I would now like to deploy this model. Let me go to the slides real quick, right? I'll come back to this and Neeraj can talk about this. I want to really show you the deployment stuff. Okay, so my next step is to save the model. I'm now using Watson Machine Learning Client, our WML Python package, which is used to save and deploy your models. And if I, so the one thing I need to, in order to use this code is this set of credentials, in order to get those, I'll go back, remember the resource list I talked to you about, so you can go back to your resource list at any time if you grab your credentials. If you're still following around in the notebook, this step is important, so please look. Click on the little hamburger menu, go to resource list. If you are inside Watson Studio, then you can go all the way at the bottom, you see IBM Cloud and then resource list. Also Neeraj I think has pasted or posted the resource link. Neeraj, if you don't mind posting that again, you posted the direct URL, you can go there directly. It'll bring you to this page. And as you can see, I have my machine learning service, so click on that, and then I can grab my credentials. So I'll go to service credentials, you see there are no credentials right now, which is fine, I'll just use this button to create a new set of credentials, leave everything as default and click on add, and now I'll have my JSON. My credentials are basically it's a JavaScript object with a bunch of properties, some sort of API key, URL. I'll take this whole thing, copy it. Don't share this with anybody else, I'm gonna remove it later, but this is your credential. So you should keep them private. And I'm going to replace this JSON with the JSON I just copied. And then we'll run this. And all this is doing is creating a new machine learning client with my credentials. So click on run. So you'll see in a second this should finish. And now I have a whole bunch of methods on my client that I can use this to save my model, to deploy my model, to predict against my model, et cetera. Actually, let me click, let me insert a cell above. And this should finish there, that one finished. Now I can do things like this, right? I can say a client dot repository dot, there is a very handy method list to see what models I have stored right now. Okay, let's see, this one came back empty. So right now I don't have any models stored in my cloud account. So here the next cell is saying store model. And I'm storing the CLF 10 model, which was to remember the gradient boosting model. And I'm giving it a name. This could be anything you like. So let's run this, and my model has been stored. Now if I go back up to this previous cell and run it again, we should see that model come up. Great, so this is my model, it was created today. It was built using scikit-learn, and this is the GWT. GWT is important, this is what I use to do anything with the model. I have to pass in this unique identifier. Okay, so now that I've stored my model, by the way it gets stored in object store that we created. That was the first thing we created if you remember. Now I can deploy this model. Now in order to deploy this model, I need to first retrieve the model. So this here is saying go to the repository, get me the model, and I'm passing in the stored model details, which was returned in the previous cell. Once I have my model, I can then, let's see, get model UIT, so I get my unique identifier, and then I get the details of the model. So yeah, I'm not actually deploying in this case, I'm just getting the details back. Well, actually this one, yeah. So I got my model details back, right? That's all I'm doing, I'm just retrieving the model I stored in the previous cell. And in the next cell here, I'm going to create my deployment. So client.deployments.create, remember previously we were working with client.repository, right? Now we are working with client.deployments.create, take my published model, and give it this name. Let's change this name, this for fun, right? That's how I get into trouble, because I keep changing things. So we'll create a new deployment model called this, and then we will grab the scoring endpoint by using this get-scoring URL method. So let's run this. Okay, so it's initializing, and let's see. Perfect, it actually created my deployment. It's giving me a new ID, my new UID for the deployed model. So the thing to understand here is the saved model is different than the deployed model, okay? Now, I want to show you something interesting. So let's go back to our project. So right on top, I'm going to open this in a new tab. And in my project now, under assets, you will see what we have done so far. So I had my notebooks from before. That's interesting. Okay, it's not showing up yet, but it should. Since we've already deployed it, it should show up as a deployed thing. It's not showing up yet, which is fine, but I know the model was deployed because I got my UID back, my unique identifier. And now I can do... Maybe your account team is different in that one. That could be the reason. Yeah, it's very possible. Yeah, I'm not going to debug right now, but since the notebook's telling me it was created, I'm going to trust the model was created. And then finally, now I can use the scoring endpoint, which I got from up here. So if you remember this one, scoring endpoint, let's look at that real quick. So insert, sell below, and then let's look at the scoring endpoint. Okay, so this is my scoring endpoint. This is what I used to predict stuff. So I can use that here. Actually, this line is not needed, already stored. And in this method, all we're doing is sending some JSON up to the scoring endpoint, getting some results back, and it's just returning the prediction. Now, some people get confused by all these arrays that come back, right? So let's actually print this out. So let's print out predictions. Okay, so I'll run this. This is not going to print out anything because it's just defining the method. So I've defined, ooh, I already broke it. Like this, let's run this. Okay, it's still broken. I think it's these ones here. Need to go, oh, that's fine. Return needs to go out, maybe. What? There. Okay, this is called fixing code in demo time. Okay, so now I can print the prediction. So just to show you what comes out. All right, so run that. And now this is the one that's actually sending, that's actually doing the prediction. This is calling that function. You can see it's calling the predict on every row. All right, and finally, this is doing the actual, I'm sending in the first 15 rows. You can see zero to 15 from my training dataset just to do predictions. If I run that, you know, this print predictions is actually this here. So that's why we have to do all this values of zero, zero and values of zero, zero, zero, one, zero is just to go inside these arrays. That's all, okay? And then this is the table I'm printing at the end and you'll see here, I have my target variable at the end here, hopefully you can see. There's a target variable and then probability of that class. So in this case, you can see this row here has a probability of one, or sorry, has a target prediction of one with a probability of 0.44 versus this one has one of 0.49. This one has a zero with 80% probability, right? So when machine learning models come back to the prediction, they always come back with some sort of confidence scores, which is important for you to look at. All right, that's the end of the notebook. Hopefully, you know, it runs for you. The only thing I expect you to do here different than what I've done is grab your own credentials. If you grab your own credentials, everything else should work as it is. So with that, Neeraj, I think I'll pass it back to you to talk about this slide here. Thanks, Okkar. So I think when you've been through all the notebooks, I think it's really important, like end of the day. Okkar, can you put on slideshow mode? It'll be easier. Yeah. So what really happens is that, you know, the business would never understand the code behind the feature engineering you did, the scaling your data. I mean, what they really want to understand is, okay, can you explain me like an end user? What are you really trying to do here? And that's when these kind of slides, you know, can make a difference. And this is like a one slider if somebody would ask us that, okay, you know, hey, can you explain me a project in one slide? And I think if we have to go back to the CIO with where we started this whole workshop, now, this is a slide, we'll start with the CIO and say, okay, you asked us to put a proof of concept and this is how it's going to be working from the end user point of view. So we are taking a feed of the features which are time-based, geography-based, money-based and the AI using AI, the NLP like a sentiment and emotion-based. Then we are making a predictions on the right that what the customer's sentiments or like the experience where it's standing right now. Then the down below is a journey of a case like CES is more like a customer service case, you know, you can see there and given the case opens on day one and the journey can go up to the unknown days like 200 days, I mean, depending on, I mean, what the case journey and how technical that is, our goal is to catch the signals which are coming from the cases on starting from day one and the signals continue to grow and the later we catch these cases, I mean, we may end up getting a non-promoter. So the goal of this project now and the proof of concept we will present to our CIO is that, okay, we built a model, we deployed it and now this is how it's working. And the next thing now the CIO is going to ask you, look, you know, okay, great job, you did that and can you share the results? Now that's where another point will come if you remember the last notebook what one of the aspects of car was sharing was one was the prediction binary one and zero stands for non-promoter, the one stands for promoter but there were too many cases. I mean, given we have millions of cases and you have made so many predictions, businesses like, I'm sorry, I don't have time to consume this and your work is done. And I think you know that, I mean, 80% of the projects of, I mean, would never even get deployed. Now, how do you make that one last step so it gets deployed, the business would be, sorry, I mean, great model, everything is fine but I don't think I can consume it because there are too many predictions and I can't work on them, I don't have bandwidth. That's where the probability score makes a lot of difference because in our case, like for example, what we would recommend doing is that's where you use the confidence score and maybe you just, maybe you wanna be really be sure about the predictions you're giving them so you can take a threshold, you can say, okay, only give me the zero predictions which is non-promoter and only filter on the probability 0.95 above like 95% confidence and above and that's where you will get very few predictions like a 20 or 30 and that was our workable predictions and if you can turn around those with the business they are like all of your fan and plus, you know, you can win some great, you can turn around some great experience so that's where, I mean, the probability played a very important role. This is how you would end up presenting and we would have gone to the CIO of the company and presented our work, so that's one. Now, one final question the CIO is gonna ask is, okay, great job, how are you gonna deploy that? And I think with this, you wanna move on to the next slide. So the next thing again goes is, okay, the exercise, you already did the lab, the exercise is again, I mean, what modeling techniques would you add to the funds? Now, like if you have the next slide on the takeaway, you pretty much ended up building a model, evaluating the performance metrics like the one chart up I was sharing and now the end part is the CIO is asking, hey, guys, I'm very happy, I'm very excited and really proud of you, you all did a great job with this exercise very thoroughly. Now, I want to ask you, how are you gonna deploy that? And we start realizing because we just don't want this to be an end-to-end, just like a proof of concept. That's when you just put together the architecture, like how things gonna work, like data collection, scoring, and the workflow integration, how would you align it to the case management systems like ServiceNow or Salesforce or Remedy or any other solutions are there? How would you take a feed of multiple data sets and create a pipeline and start scoring within Watson Studio or other parallel solutions out there? So this is more of an architecture diagram or the solution and if you go on to the next one. So this could be, if you look at this one, this is how could be one of the possible ways for business to look at. So this chart talks about how these predictions are generated and what is the likelihood for these customers to be unhappy the time you ran the predictions. So you can see, because earlier you saw everything as encoded, but then later you kind of reverse engineered data and show them the values and tell them, like these are the cases. This is the likelihood. And on the left chart, you can see the visual graphics on, I mean, this is a dummy chart. Like, you know, how do you, how these predictions are spread across globally, where some of the customers have been expressing the poor experience, you need to count on that. And then the right chart talks about maybe, you know, how valuable these customers could be. I mean, all of them could, would be definitely really important depending on, I mean, if you want to look at customers with a very minimal lifetime spend or maybe they are just the growth accounts, the sum of them very mature, you can see on the top right corner, you know, a quadrant. So this is the way, you know, kind of a business would consume it. And that's how the CIO would see that when you present to him. And with this, the lab would be that, okay, now how would you present your work as an end output or a webpage or an app or a consumable workflow? And I think, and in this case, like our lab, I mean, has been like, you know, you all saved the model, ran it. And then in the exercises, I mean, how do you plan to consume your outputs of the model? Because great exercise, great proof of concept. But if you don't give it at the hand of the consumer, it's of no use, honestly. And that's where it's the most, that is one of the most critical step here. And that's the last part we were sharing with you. And our CIO is really proud of all of you now because you went through a long list of checklists that you can say you've done it. If you've been able to follow all the steps, now you've been able to generate ideas on how to consume the predictions and indicate solution in a business system. So that's pretty much is the last step that you all have done. And with this, Maureen, over to you. Hey, thanks, Niraj for helping out with this experiential journey. And for everyone who has hung in there, this was a quite a journey. Hopefully it was what you signed up for. I'd like to acknowledge the people you see on the screen here, George Stark, who was key to establishing this session as well as the data scientist professional certification available through the open group. And I'd also like to thank all the folks who have been helping behind the scenes, answering questions and Andy Ellis in addition as one of the people that we've been working with on the profession throughout the past few years. So thank you all so much for joining and I will turn it back over to John and Maggie. And again, our sincere thanks and we look forward to hearing from all of you in the future.