 In this short video I'm going to show you how to use the Python script node inside of Orange. Sometimes you just don't have an interesting dataset to work with and so you can just open the Python script node and you can write a bit of code and just generate your own data for use inside of Orange. I've opened Orange desktop here and you see on the left hand side all these little different modules that I have. So let's open the data one and what we want to import is just the Python script. I'm just going to drag the Python script right here to the screen. So let's double click on it and you see I already have a little script here. Every time you create a script you can list it here. You can add a new one with this plus sign. If you change anything you hit up you just hit update there's a more button for you to import scripts etc and then you see a little Python console here at the bottom and with this version of of Orange we're running Python 3.7.6 as we can see there. So there's my little script at the top. So we're just going to use scikit learn to generate some data and we're going to use the the orange dot data library just to to handle the data for us. So let's look at the script by the way it's right up there generate data. If you click on a new one you can double click on it and just rename it. So let's have a look it's a bit small there from sklearn.datasets. So sklearn is scikit learn so that's a library that's built into Orange here as one of the libraries in Python. sklearn.datasets we're going to import the make underscore classification function and that's the function that's actually going to generate the data for us. It's going to be a classification problem and as much as the target variable is going to be a categorical variable. Then from the numpy library I'm going to import two functions where and c underscore that's for concatenate c underscore and then from orange dot data we're going to import star and that just means we're going to import everything from this data library inside of the orange package. Now here we're going to use our make classification function right away and it is going to create two things for us two variables and that's why we save them here on the left hand side it's going to be two numpy arrays x and y x for my feature matrix y for my target vector. So we're going to say make classification now if you look at the make classification function on the scikit learn website you'll see there are plenty of arguments we're going to use the n underscore samples a keyword argument there or just argument and that's we're going to set at 10,000 so we're going to generate 10,000 rows of data the number of features n underscore features so that's the number of columns in our feature matrix so the number of variables that are going to be our independent variables our feature variables and we want five of them n underscore classes because this is a classification problem my y is going to be a vector and i want us to have two classes so it's going to be this binary classification problem flip underscore y and we set that at 2% or 0.02 so it is just going to flip 2% of the values from one class to the other class just to make things a bit more difficult for our learning algorithms and then i'm just going to set a random underscore state to random set to 12 so every time i run this the script we're going to generate the same random values so my y i just need to change that from the zeros and ones this is a binary class we set n classes equals two there so we're going to have zeros and ones there i just want to set that when it's zero i wanted to have the string no and when it's yes i want that to be changed to a yes the string yes so for that we're going to use the numpy where function so y equals where and then we use a bit of boolean logic when y equals equals zero so it goes down every one of those elements in that vector is a zero if it is so if it returns a true we're going to change that to the string no if that's false so in other words y equals one we're going to change that to yes and that's how the where function works and now we've just got to combine this into we've got to concatenate these two the the matrix and the vector and for that we're going to use this c underscore and then inside of square brackets we're going to have x comma y so it's going to take every y and connect it to the right row in this feature matrix x and then i'm going to convert that to a list object so i'm going to use this numpy array i'm going to use a method called to list i'm going to store that in a computer variable called data array so we're just concatenating x and y there and we're converting it to a python list now next up the data tables that we see if you import a csv file that gets converted into this data table structure in orange we've got a we've got to tell orange or we've got to use code so that orange could know what kind of columns these are what kind of variables we're dealing with remember if you're doing this dragon a csv file you can change that you can change the type of a column because we're setting this with code we've got to actually just this all we can just set it with code here so i'm going to have these five feature variables so monthly underscore income monthly expenses boundary payment vehicle repayment month in seldo how much as money is lifted in the bank at the end of the month for every customer so this monthly underscore income i'm sending this to a continuous variable and you see the function there so that comes from orange dot data import star so that's one of the functions there in orange dot data and i'm going to give it a name income and the same i'm going to go for all five of my feature variables they're all going to be continuous variables because that's what my classification is going to do it is also going to normalize the data for me so it's going to have a mean of zero and a standard deviation of one so vehicle repayment then also just naming that vehicle and etc you can see the pattern there then i'm going to create one more purchase i'm going to call it and that's going to be a discrete variable and it's going to be called purchase and the values are going to be no and yes that is going to just relate to these that i've created up here the no and yes so it was zeroes and ones before we changed it to noes and yes and now we're just creating this variable no and yes because now i'm going to use this domain function and look how we do that we make this python list so it's all inside our square brackets and we list all the feature variables which we've specified there and then comma a class variable and that is going to be this discrete variable here with yes and noes in it and all i have to do now is use the table function and then the dot form method on that table function and i've got to specify the domain so that's going to be this whole domain here and the data array and remember the data array is this concatenated x feature matrix and y target vector we've put them all together there and we've got to use this from list because remember we created a two-list conversion there and so tabled up from list the structure of my data table is going to be this domain and we've listed how this what the structure looks like so that's a monthly income is a continuous variable it's a continuous variable and then my class is a discrete variable and that is my class so i just create the structure of the table there and now we fill the table in so there's the structure of the table and we fill it with values that we concatenated up there from the make classification function and then inside of orange there's the out underscore data object so i've got to create a computer variable out dot data and i'm just going to assign that to this whole table that we've just generated up here and you can see on the left hand side there's some input variables in data in datas in learner in learners etc and then out variables out data out learner out test find out object so we've got to use this out to data output variable and i've created it there and all we've got to do now remember if you just changed anything just updated hit the update button but now we're just going to run the script and suddenly you see the running script it's executed without any problems so there we go that was our python script and what we need to do now is just to check what was generated so let's go up there and we're going to view a data table so let's click on that data table and we see there there we go there's purchase yes or no yes no yes no yes no and we see this the normalized values here generated by the make classification function income expenses bond vehicle cell though so it's all there remember as always the first thing we want to do is just a bit of descriptive statistics so we're just going to go for feature statistics so let's have a look at that and we're going to color by purchase so we see how many of the clients went for the purchase and how many did not go for the purchase and across all the numerical variables we see we see the graphical representation but what we really are after is center so that will be the mean and you can see the means at zero and we're going to see dispersion and that is the standard deviation divided by the mean the minimum the maximum and then the percentage of missing of course they know missing values let's have a look at visualizing our data what we could perhaps do is just look at a scatter plot so let's just look for a scatter plot and there we go for all our numerical variables they all listed up here so I can go income versus expense and we can see pretty poorly correlated there we can go income versus bond and we can see something interesting going on there income versus the vehicle purchase it's quite a quite a strong correlation there and income versus the month in seldo you can see some interesting stuff going on there as well and that's it we've generated our own data using a python script so we need to go into a Jupyter notebook or open our python IDE we can type a python script right here generate some data and we can see that data just for just for information sake let's also just create a data sampler there and if we have our data sampler let's bring that in let's just take a fixed proportion 70% of the data is going to be our training data and 30% is going to be our test data and let's just create one quick little model here let's do a random forest model there's our random forest model I'm not going to go into it to set any kind of any kind of parameters there and let's see what that looks like let's do a test and score test and score we see there and we also bring in our data sampler and you could see the model was running there and if we have a look at it we see an area under the curve of 0.95 and what was done is a 5-fold cross validation there and that's it we generated our own data no need to to have any data we can just use a python script and generate our own data right inside of orange