 I am Srijak, I work with IBM Watson and this talk is about platform development in data science. Now, I will start with a quick question like how many of you attended Dr. Denise's session today morning. So she talked about this, she said once you go several less you never go back. So this is that several levels era. So this is what I am going to talk about the disruption that is getting caused by machine learning various ways you can apply machine learning in your day-to-day life. And in the machine learning domain there are different roles or personas that we need to understand a basic architecture of how ML platform should be. And I will take one simple problem I will solve it through a platform in this case I am using IBM Watson machine learning platform. And yeah I will give you a demo of how to solve it using a platform. So how many software developer here? Okay quite a few. So let us start with that disruption that is caused by machine learning. So you see a bunch of companies here. So the relation between them is all are data companies. So you have heard about this term called data is the new oil. So whoever possesses data possesses power. So these examples are like Uber is the world's largest taxi company, right? It doesn't own any vehicle. Like Facebook the world's largest media content or platform. But it doesn't produce any media of itself. It doesn't produce any media. Alibaba most valuable retail customer they don't have any inventory. Airbnb the largest real estate company like Hodel business. They don't own any Hodel, they don't own any real estate at all. What they hold is data. And through that data they are causing this disruption. They are plucking out all these big companies who were traditionally were doing business in these fields for so long years. But with the power of data they are just competing with them and they are exceeding them. So like how you can use ML in the day to day life. So it's from the morning you get up to the evening. You're going to sleep everywhere you're using or you're connected. Most of I mean all of you have mobiles right with apps. Constantly you are getting input and your input is getting stored somewhere. It's getting analyzed somewhere. They're generating insights from those data and they're giving you an experience which will make you stick with that app, stick with that company or stick with that mobile itself. Through web we have done from web is a kind of traditional way of using input of using ML data. Mobile I said now we have voice devices, Google Home, Alexa. Through voice also you can interpret and through voice also that machine stores your voice, profiles you and give you a better experience. Most of you have attended the Alexa talk yesterday. So there he explained that a lot of IoT devices, home automation. Every ad nowadays you are seeing is tagging themselves with a smart adjective. So what is the smart? Smart is that application of analysis of the data. Most of the time it involves ML. Automobiles we have a talk from Dr. Savita yesterday about connected cars. So we are using connected cars and automobiles we are also using this. So different roles in ML. So we have a data engineer who is a part of it. We have data scientists. I think a lot of fewer data scientists here. We have business analysts who analyzes this. And we have application developers or software developers that I pointed out. So the data engineer, it architects how the data will be used and how the data will be used and stored. So that's all of, I mean most of the time when you are a designer, you are a data scientist, you overlook the fact that this guy, this role data engineer, he is getting data from various different inputs. He's cleaning it, he's storing it somewhere very accessible to you so that you can, you get a beautiful data. You can build a beautiful model out of it. So very important role. Data scientists of course is the backbone of any ML driven application or ML driven solution. He is the guy who will choose the algorithms, who will generate the hidden insights from that data, that the data engineer gave you. And he will give you a model which you can use for your solution or for generating insights. Business analyst is kind of one who is a decision maker. You have a huge set of data, you generated some insight out of it. Now what to do with that? You have to act upon it. You have to work on how you will use that insight. So the business analyst will analyze that result. He will tell you that this is the way it should be, the business decision should be taken. Or he'll give you a feedback that this is good or this is bad, the result prediction. And there comes application developer. Application developer is one who is, suppose he's building a solution. So say a mobile app company who uses a model as its backbone of this app. You insert the data, you feed it to the model, you get the result out and the result is getting displayed in the mobile. So application developer is the one who is using that model that is created by data scientists. He may not have any information or he may not have any knowledge about how the model works. But he is the one who is using the model. He's getting, I mean, he's the one who is performing the scoring or the result prediction out of the model most of the time through his platform. So I'll go to a development architecture of the machine learning. There are several stages that are important here. Injection, injection is the part where data engineer comes in. So nowadays data is from everywhere you can interpret data. It can be structured, it can be unstructured. You can read it from a traditional relational database. You can read it from a S3 bucket in the JSON format. You can read it from IoT devices, sensor data, everywhere the data is coming. Now the ingestion is plugged into any number of sources that you have the data in. It can be a Hadoop cluster, it can be a S3 bucket, anywhere you plug in and you feed the data inside you, and inside your system or platform. And then you do a cleansing job out of it. So data engineer has to be the one who will know that this column of data cannot be used, so you have to pluck it out of the relevant data. Like in a school, say you're collecting a data, so you're collecting age, roll number, name of students, or how much marks he obtained, everything. So say if I tell you that you do the average of the roll number of the student, does that make sense? Roll number is just a number that is identifying a student. So average of it doesn't make any sense. So this is not a relevant data for your analysis. So you can pluck it out, reduce the volume of the data, and then your model will run faster, more efficient. After ingestion, there is training. Training is where data scientists come in picture. For training, you can use n number of runtimes. You can use n number of algorithms that you know. It has to provide you with a platform where you can use all this. Say you prefer Python, so you need a Python runtime. You need a Python framework, like a lot of you use Keras, a lot of you use TensorFlow, Python runtimes. That has to be provided. Say you're good with Spark, so Apache Spark runtime has to be there. A Scala or Java development scenario should be there so that you can develop your model out in that. Deployment, so here comes another problem, like you are a data scientist, you got a beautiful data, you created a model, very good model. Now it's in your notebook or it's in your system. You have your local cluster where you created the model. How do you consume it? So that model is sitting in your cluster, anybody will not give you the data, like here is my data, feed it to the notebook, see what's the prediction is, give me the result. That way a production system cannot go on. So the deployment is the part where you're deploying the model in an environment. It can be like a web service. So web services are very useful in the sense you can connect to a web service using REST frameworks. So from any platform or any devices, you can access that model and you can use that model. So the accessibility of the model becomes such larger. So data scientist will create the model and then when you deploy it in some runtime or some environment, you get to work with that model, you can get to use that model. And the feedback, this is also very important. So when you get the result, you should not only take that, okay, I build that model, I got this result, this is perfect. Most of the time that's not the true case. So you need to feed that result back to your training. You have to enrich your model again and again so that the model becomes perfect. So I've taken a simple lemmel problem here. If you see the table, is the table visible? So it is a patient record for diabetes. So I have glucose, blood glucose, blood pressure, insulin intake, BMI, age, and the outcome. So outcome is zero and one. So whether this patient with this health parameter is likely to have diabetes case or not, I'm predicting that. It's a logistic regression problem. So if you are a data scientist or you are a data analyst or something, you will know it's a logistic regression problem. You will find the probability of how to get that data result out by this formula and you plot a graph and you get that. But that solution is still in your architecture or in your local environment. So I'll show you how to use a platform to solve the similar problem and get the advantage out of it. So here you can see it's a normal platform use case where you have history data from there. You're visualizing that feature engineering on that model training and model evaluation you're doing on that. Then you get a pipeline model. It can be a Python model, it can be a Spark model, it can be any of this thing. And then you're deploying. So the developer or stakeholder, he will deploy that in a runtime environment. And then you have the operational data that is another set of data that can also come from various streams like the history data for your data set. Operation data can also come from a Twitter live stream. It can be a web service where you're filling up a form and hitting it to give the data. It can be a bad job where you have a company monthly report you're making. So you have a huge chunk of data which you're feeding at the same time. So that operational data is hitting the deployment you're doing the scoring there. Scoring is how the prediction is happening. So the prediction happens. You're monitoring the prediction, like how likely is it good or bad. And then it's generating your feedback data. That feedback data is again through retraining, is getting feeded to that model, and the cycle goes on. So this is the whole architecture of a general ML pipeline that you guys will use. See that. Now I'll switch to a demo. These are some links of, since I'm using DSX data science experience from IBM. So these are DSX links. This is a must link is there. I'm also representing must. You can check that out. I'll switch to demo. So this is how a, so this is how a distance experience when you log in it will look like. So you have to create a project. So I've created, already created a project for DSC. So you go in there, you have your assets there. You have your data assets. You can have your models. You can have your experiments, model of flows, a number of things. So data assets is nothing but the data you're feeding. This data can be fed here through a file, through a connection from a, you can connect to a DB2 database, you can connect to S3 bucket, and feed the data here. For the quickness of the talk, I've already ingested this data. So what I'm going to do is create a model here. So you click on the model, give model name. So this is one, yeah. This is one Watson machine learning instance that I'm, yes. Watson machine learning instance I'm using, and I'm using a model builder. And this is a Spark runtime that I'm using for this prediction to happen. I'm using this automatic, you can switch to manual to tweak about the data also. And, okay. So here it asks for the data which data on, on which you want to do, I want to do on this diabetes CSV. It's creating a kernel in Spark. So once the kernel is created, the data will be ingested in that, and it'll be analyzed as loading data. So you want to predict the outcome, right? Yeah, outcome is one or zero. You're predicting outcome. Your feature columns are all for default. And so it's a binary classification you're doing like logistic regression you're applying there. This is some data sets that this is, this portion of the data will be used for training. This is for testing, and we have some holdout. You can make the holdout zero also. So next. It's training model now. Yeah. Yes, yes. So I'm using a model builder here, which is a basic UI-based one. You can have a notebook also in case of, in place of this, and you can walk on that. Okay, so it's trained in complete. It's evaluating the model. Yeah, trained and evaluated. It's saying the performance is good because the data set is small. So it's coming. So if you are using a different algorithm, this performance can vary. So I'm just gonna save this model if it runs quick so that I can show the full thing. So the saving is done. Yeah, so this is how the model looks. The input schema you can see here. It's for that model. And the deployment part. So I'll just deploy this. And I'll deploy it as a web service. You can deploy it as a batch, as a streaming service also. So you just give them and you'll deploy. Deploy is initializing now. Okay, so it's a successful deploy, so yes. Go in. So now that model that we built two minutes back, it's deployed as a web service now. So you can test, this is the parameters. You can hit that using any web service, curl or anything to predict. So I'm just going to show you normal testing. Say I have a blood glucose of 100, I have a blood pressure of 80. I take insulin, save 30 units. My BMI is what, 25. My age is 32. And I'll just predict. So it gave you the scoring, right? So this looks fuzzy because it's all the parameters and the results is coming. You can see that properly here. So likelihood of me getting diabetes for this data is zero is pretty much the result that it's showing. Okay, so that's the demo I had. I've created this demo in this slide also. I'll just, so that's it. A trial period is free. But you will get some limitations about the like, in Aperture Spark, if you're using, you'll get only two executors and all. So those limitations are there for one month. You can use it for, yes. You can ingest data from EWS, but it's not running on, this is IBM. So I give this demo on IBM, Watson machine learning platform. EWS also has a similar platform that you can use. Yes, through IBM Cloud, the IBM BlueMix, you can go there and scan.