 Hello everyone and welcome to another episode of Code Emporium where we're gonna talk about some data science myths. So there's a lot of talk about what a data science should be and what data scientists should know, but there's a lot of them that are also misconceptions and we're gonna debunk a lot of them right now. Before we get to the video though, please do give it a like and the more you like the video, the more other people will also see the video and they'll like the video and the process goes on. Please do also join our Discord server down in the description below. We are gonna be talking about some amazing topics there. We'd love to have you a part of our wonderful community. Ring that bell if you haven't and subscribe and let's get back to the video. You need a PhD to become successful in the field. Not true. Also because well, I myself do not have a PhD and I have been a full-time data scientist for about two years now, although I wouldn't say that I am successful. I am technically on track to becoming successful in some way. I also do know a lot of people who have an undergraduate degree in the field and they're doing just fine as data scientists. A lot of this stems from the misunderstanding of what data science actually is about. It really isn't about just programming in mathematics. A lot of data science is really just about solving problems and communication. And these are certain skills that you just don't necessarily learn in a PhD. In fact, you can learn in a PhD, but there are definitely other ways to bolster your communication skills as well as problem-solving skills, which is also why you have so many people transitioning from different job backgrounds into data science because of the background of solving problems, which attracts a lot of people. Although I will say a PhD is super useful, maybe in case of like larger companies like Google and Facebook and Amazon, it is not impossible to get a data science position without a PhD, but they certainly do prefer certain PhD candidates. That said, there are opportunities for data science without a PhD almost everywhere else. Data science is all about modeling. Well, you probably wish it was all about modeling, but in reality, it is not. There are multiple steps in the data science pipeline from start to finish. Data science is all about solving problems. And machine learning and modeling is only one aspect of it. In fact, let's try to like list out certain parts of that pipeline. So first is that you need to understand what the problem is. You need to have a clear definition of the problem and what are its potential inputs and output variables that are expected. Next, you need to do some data analysis to see which inputs can be used to get the output of the required problem that is required by your stakeholder. During this entire process, you were consistently communicating with the stakeholder as to what they actually want as an output. Third, after this, you need to start making decisions. Do you need to use machine learning at all? Because there are some problems that are simple enough where you can solve with the simple SQL query. And also, even if you do need to use machine learning, can you really use machine learning because there are certain data quality and data quantity objectives that you need to meet? You have a lot of data, but if that data is not good enough for understanding and analyzing patterns, which machine learning is used for, then you can't really use machine learning. Also, if you don't have enough data, you really can't use machine learning either. Once you then make that decision to go forward with the machine learning process and step forward, you can actually go and do some modeling. And this is what everybody's really crazy about. But clearly there are so many other steps before this that matter. And even after the modeling phase, there is the testing phase. You have this model in production, but is it really better than the systems that exist already? Or is it really improving what the industry wants you to improve? And so you'll have some KPIs, some key performance indicators, for which you'll run some AB tests with your current modeled approach. So it's clear that machine learning is only a small step in this entire five-step process that have lifted out. As a data scientist, everything is pretty important and not just the modeling phase. So try to learn every step of that pipeline. You need to know everything to become a data scientist. Also, not entirely true. There definitely is a breadth of topics you would need to know to try to break into the field, at least compared to some other fields that I know of. Some of the five topics that you should actually know pretty to a rudimentary degree, at least when starting out is SQL, programming, mathematics, well, probability, technically, and statistics, and then algebra. Different companies require different levels of understanding of each. There are certain companies out there who really like data scientists to be programmers through and through, so they will try to make sure that, okay, are you good with software engineering as work? And then you can gradually transition into the data science space. Other companies are really analytics heavy and they want you to be able to understand what data is, what it means to you, and also how to use that data to potentially understand opportunities, to identify problems, and also go and solve them. There definitely isn't a holy grail definition of what a data scientist should and should not know, but whatever fields you're interested in, whatever combinations of fields you're interested in within the data science realm, you can learn those and probably look for companies that are looking for data scientists that you aspire to be. Machine learning can solve all problems. Machine learning is magic. It's also a great myth that is not so great. Machine learning itself strives to learn patterns between inputs and outputs and data. And there are situations where you just cannot use or should not use machine learning, and I can divide this into two major categories. The first one is the lack of data quantity. This means that you just don't have enough data to recognize the patterns between inputs and outputs. In such situations, you really cannot use machine learning. The second aspect where you shouldn't really use machine learning is when you have a lack of data quality. So because machine learning learns the mappings between inputs and outputs, even if you have a lot of data, but the input to output mapping is very noisy, you just cannot really make use of machine learning there, no matter how much data you have, because it's not really able to learn patterns since the mapping is just that noisy. Also, in addition to this, even though you do meet data quality and data quantity requirements, there is this question that needs to be answered of, okay, so you can use machine learning, but should you really use machine learning? This follows a principle of Occam's razor. So you want to use the simplest approach, which delivers good results. So if there's two approaches, one use a modeled approach, one uses a simple queried approach. Both of them have similar performance in solving the same problem. You want to use the simpler SQL query approach. It's easier to maintain, easier to scale, and less hiccups that can potentially happen along the way. Data science is dead. Now, I've read a few articles that suggest that data science is dead. And unfortunately, this does come across because of two major parties that are at flaw, that is data scientists themselves and also people around. Now, the thing with people around is they, especially in the industry, they are not willing to probably look to data science because they treat data science as this magical black box world. They don't really know what data science can do. And because of that, they will not try to incorporate data science ideas, tasks, and their work in their own workflows. On the other hand, too, we have data scientists where there are certain cases where data scientists try to explain what they do using jargon. So they try to explain jargon with more jargon, which only causes more confusion and people lose trust in what data scientists can do. Because if they're speaking in jargon, I already don't understand. And if they're trying to make me understand, I still don't understand. There's no point in me trying to use a data science service whatsoever. It's in this case where data scientists should really value communication over any other skill. I've emphasized this before and I'll continue to emphasize it again. Communication is key. It's more important than any other aspect of the field because this field is so nascent and there's so many new ideas that you need to learn and also try to distill that information to stakeholders so that they understand why your work can help them. So here's a tip for practicing communication skills. And it's a tip that is also used to learn any topic of interest. This is known as the Feynman technique. So essentially it involves getting a sheet of paper and then writing down, let's say you think of a complex topic. You want to write down the explanation to that topic in detail without using any jargon. If you are able to explain the entire topic without using any technical keywords, then you can say that you have truly understood that topic. Now, if you are leaving in jargon, that just means that you have some holes in your knowledge. Now, you can take this a step further from learning to actually teaching if you were to make sure that you look at your explanation without jargon, did you lose any technical detail in that explanation? Like if you were to explain it in a very technical sense versus a non-technical sense, how much of the detail can you retain by explaining it in a non-technical sense? And this is what makes it so much better for teaching than just learning. And when I say teaching, teaching it doesn't mean like you're necessarily in a classroom like teaching people, but it is also great for communicating with stakeholders. And this is kind of what you want to do because if you are able to distill information but to a degree that you're losing too much detail, you know, there are some stakeholders that will be even more curious and you won't be able to answer those questions. At least you won't be able to answer those questions in a way that they wouldn't understand, which again causes the problem that we had in the past of, oh, I don't understand what you're saying, so bye-bye. I delve a lot into this Feynman technique in my 50K subscriber special and I do recommend you do check that out and try this out for yourself too in some way. I've been trying this out in a lot of the technical videos I've been making recently and it's been really good. I've improved my learning capacity, my teaching capacity and it's also helping me communicate better at work with other stakeholders. And that's all I have for you today. Thank you all so much for watching. I hope you got an idea of what data science really entails and you're not going to be fooled by some of these misconceptions. If you have any more thoughts, please do leave them down in the comments down below. Also, do check out our Discord server down in the description below like I mentioned. Give this video a good old fat like on your way out. Subscribe for more content and I will see you very soon. Bye-bye.