 Hello everyone and welcome to another episode of Code Emporium where we are going to go through some more data science interview questions. So I'm on this site called Intellipot and they have 78 data science interview questions with answers. Now I'm kind of doing this series of going through blog posts with interview questions because I see that there are certain answers here where for a textbook question they tend to give like a very textbook answer which is fine but I thought I would also add my two cents given my experience as a data scientist so that you can better explain these answers during an actual interview. In our last video here we went through some of the basic data science interview questions but this time we're going to go to the intermediate level. Now before we do get started please do give this video a like. I really would appreciate it because the more people that like this video the more that this video spreads and then more people like it more and the process continues. Also I have a discord server now it is in the description down below please do join that discord server and we're going to have so much fun there. Please do be a part of the community because we're going to talk about many things that is artificial intelligence and we would love to have you. And with that let's get back to the video. Alright so what is an ROC curve ROC stands for receiving operating characteristic. So fun fact I have actually been asked this question in an interview and more of the part of like have you actually ever implemented an ROC curve and if you have like how would you do it in this case it is a plot of true positive rate versus false positive rate where you iterate between different thresholds for classification models and you increase the threshold instead of saying it's 0.5 you would increase it in increments of maybe like 0.01 and then plot out the true positive rate and false positive rate on a graph and the more area that is encompassed in this region the better it is. You could probably run through this with an example. What do you understand by decision tree a decision tree is a supervised learning algorithm that is used for both classification and regression. Now this is actually pretty true decision trees are also used for regression and in fact for the regression here on the leaf nodes it's essentially going to be the average value of whatever the training labels were. And so during like test time whenever you have like a sample coming in here and it goes down to one of these nodes you can actually have a continuous value that is returned by a decision tree hence in this case the dependent variable can be both a numerical value and a categorical value and here the nodes denote the test on an attribute and each edge denotes the outcome of that attribute and each leaf node holds the class label. A little hard to understand the English here but I get what they're going for here. Overall a decent explanation but I would try to rephrase this this last part. What do you understand by a random forest model? It combines multiple models together to get the final output or to be more precise it combines multiple decision trees together to get the final output okay. So decision trees are the building blocks of the random forest model. In essence this is true but what I would like to see as an interviewer is probably a little bit more detail on ensemble learning itself and why we would need something like the random forest model. Why do we need to get a collection of decision trees make decisions for them and then aggregate them together? You can talk about how it mitigates overfitting in this case. What are precision? What is recall and what is the F1 score and how do we calculate it? So precision when we are implementing algorithms for the classification data or the retrieval of information precision helps us get a portion of the positive values that are positively predicted basically it measures accuracy of correct positive predictions below is the formula to calculate precision. This is mathematically true but I would try to explain this in more interpretable English probably with an example. So let's say that you build a fraud classifier and you predict one as fraudulent and zero as not fraudulent. Now how would you define precision in this case? Of the people who we said committed fraud how many of them actually committed fraud? And then we can also do the same for recall. Of the number of people who have committed fraud how many of them did we say committed fraud? As for the F1 score over here F1 score technically tries to take the two metrics precision recall and combine them into one metric. However when stating this in an interview I would probably err to say that it is not used as much or I at least don't use it as much when I'm actually working on real projects. It's easier for me to look at both precision and recall separately because F1 as its own metric it although it makes things simpler to look at that metric conflates a couple of values which makes it a little less usable. It's obviously not completely useless but it is definitely worth noting during an interview and instead depending on your problem you would value either precision more or you would value recall more. If you are working on a fraud classification system perhaps it's more typically important that you make sure that you get high recall because you want to make sure that you get as much fraud you like cases that actually happen and you want to mark them as fraud as much as possible but obviously you don't want to go to overboard with that which is why precision is there as an accompanying metric. So overall just try to give certain examples like this of how you would use precision and recall and why one might be more important than the other in certain cases and also if you do use F1 score how you would use it in general. The next question what is a p-value? The p-value is a measure of statistical importance of an observation. It is the probability that shows the significance of output to the data. We can compute the p-value to show the test statistics of a model. Typically it helps us choose whether we can accept or reject the null hypothesis. This answer is probably technically not also entirely true. So first of all we don't really accept the null hypothesis we just don't have enough evidence to reject the null hypothesis. The p-value is the probability that determines how ridiculous or not ridiculous the null hypothesis actually is. So a smaller p-value would indicate that the null hypothesis being true is actually just that low. And so with the evidence that we do collect with the data that we have we can say that we have enough evidence to reject the null hypothesis if that p-value is low. What is the bias variance trade-off in data science? So it's kind of funny that they say in data science because it's like such a such a broad field in general but essentially the bias various trade-off is the trade-off when a model is either underfitting because it's too simplistic but if a model gets too complex then it starts memorizing data so even small changes in your training data can essentially affect your model. And you can just talk about how it's kind of a tug-of-war that goes on where you know do you rival simplicity versus complexity underfitting versus overfitting not capturing enough patterns in data versus capturing too much to the point of memorization. What is RMSE? RMSE stands for the root mean squared error. It is a measure of accuracy and regression. RMSE allows us to calculate the magnitude of error produced by a regression model the way RMSE is calculated is as follows and then essentially they give the verbal dictation of the actual formula of RMSE. This is a fair definition and you can also just give examples of which regression models you would say use RMSE which is most of them but then there's also you can talk about certain cases where a mean absolute error might actually be even better. You want to talk about that? You can as well. What is a kernel function in SVM? In the SVM algorithm a kernel function is a special mathematical function. In simpler terms a kernel function takes data as input and converts it to a required form. This transformation of data is based on something called the kernel trick which is what gives the kernel function its name. Using the kernel function we can transform the data that is not linearly separable into one that is linearly separable. So this answer does have the broad idea and the broad strokes. Maybe you want to be a little more specific about what the kernel trick actually entails and how it actually makes calculations much simpler. However in general I will say in my experience as coding out so many models I don't tend to use SVM mostly because it does have a very high complexity and also it doesn't allow you to add way too many training examples. Mostly because that kernel function itself it's proportional to like an n cross n matrix so it scales with training data and because it scales with training data processing time becomes slow with large data sets. But in the end SVM is still very useful to know as an algorithm at least as proof of concept. Maybe for an interview try to brush it up a little with the kernel trick but just letting you know that it's not as much glorified in actual industry as opposed to like something like XGBoost some other variants of gradient boosting too. And that's all that we have time for today. Thank you all so much for watching this episode with other data science interview questions. I did skip a lot over this article since there are many repeated questions that I have addressed at detail even in the past interview episodes and so I do encourage you to check those episodes out for more details. I'll also put a link to this blog post down in the description below so you can check out the questions that I actually missed and also get a read for it. In the end the most important aspect of an interview is communication. You need to be asking questions as much to the interviewer as they ask you or at least to an extent without wasting time. If you keep that in mind I'm sure that you'll do super well. Thank you all so much for watching again. Please do drop that like. Join that Discord server down in the description below and subscribe if you want more of this. I will see you very soon. We're going to continue this playlist of data science interview questions and I hope to see you around. Bye bye.