 I got my first data science job back in September 2021 and I remember come the end of my first year the lot of things that I wished I knew before going into that first role. So in this video I want to get over some of the regrets I had or basically the things I wish I knew before landing my first data science job. I hope this may help you and also overcome the things that I struggled with when you have your first role as well. Let's get into it. When you're learning data science chances are is that you're working by yourself. So the need of version control for your code is probably very unlikely. Not to mention if you're anything like me learning things like Git is not as fun as learning things like neural networks for example. However I can promise you that nearly every data scientist in the industry uses Git or GitHub in one way or the other. In most companies you'll be working in cross functional teams with analysts software engineers and product managers and chances are you'll be working on the same code base. So the need for version control is paramount. To be honest learning Git isn't very hard and it shouldn't take you more than a week to add all the basic functionalities to that beginner level. Free code camp has a great intro video to Git and it only takes about an hour to complete so you can't go wrong with it. After you watch that video I really recommend you set up your own GitHub profile and add some reposts to it using the command line to really get some hands on experience. Now the majority of beginner or student data scientists use notebooks as their main ID like Jupyter notebook or Google Colab and there's something wrong with this. Notebooks are great to visualize data really good for beginners and overall a great kind of package for data scientists. However in industry most of your machine learning algorithms are not deployed in notebooks they're deployed in software developed toolkits like pycharm and vs code on systems like aws with things like linters docker containers and unit testing and unfortunately these things are just described I'm very easy to do on a notebook. I really wish I upskilled more in the software engineering kind of basics and principles before going to my first role. I highly recommend that if you're still learning data science that you implement some of your solutions in more developer and software engineering focused ideas like vs code or pycharm and if you want to learn things like docker unit testing or limiting then I have a whole series of articles I've written a medium which I'll link in the description below that you can check out. If you're anything like me then pretty much everything in data science excites you you know I really wanted to learn reinforcement learning deep learning optimization forecasting I mean the list goes on basically I wanted to learn everything and I thought going to my first role that I was going to learn all these things and become an expert in pretty much everything data science and machine learning in reality it's nearly impossible to learn all those things and because there's so much active research going on the size of the knowledge part just gets bigger and bigger week after week instead I realized it's best to focus on one domain or one specialty that best aligns with the area I was working in at the time focusing on one thing is how you make progress and even then after one year of continuous studying in one area you still wouldn't be properly classed as an expert it's not a bad thing to start your career by being an expert or basing your skills in one area you can always pivot at a later time. When I got my first role I was so excited about all the fancy machine learning algorithms I was going to build in reality building algorithms is only like five percent of the job being a data scientist is much more than simply just coding algorithms it's really more about gathering data digging into it and trying to find what the data is really showing I mean it's literally in the title data scientist most of my workflow was basically just manipulating data and just getting it rather than using it to train the model data is by far the most important part you can answer many business questions and generate a lot of value by simply just looking at data and just describing what it's telling you you don't need to build machine learning algorithms all the time deploying algorithms have the time in place and they do generate value however like I said most of your time is spent simply just looking and analyzing data and this is something I need to continue to remind myself to this day and something I wish I knew before going to my first job data science is a highly technical field so it's easy to think that's become a great data scientist all you need to be is very good at maths encoding from my first year experience the best data scientists were the ones who basically knew everything about everything they were the ones who could basically describe their findings and articulate it in a very clear way and this is how they have influence within the company you see the models you build mean pretty much nothing unless you can explain particularly to non-technical stakeholders what they're actually doing behind the scenes but senior stakeholders are not going to be too keen if what you're doing they don't really understand and you can't explain what you're doing so the key here is that trying to make your work as interpretable as possible and digestible as possible because that's the way you're going to drive influence and really stand out and basically you know move up the ranks in the company now the way I do this on a daily basis is that I always try to have the business focus the back of my mind and ask myself the question how is the work I'm doing influencing the business and often the answer is simpler than you think and you don't need a PhD in quantum physics to figure us out in the real world there is no such thing as a nicely cleaned and predefined CSV file that you can simply read in and start building a model on I had to learn this the hard way as I was used to these nicely formatted CSV files that I was getting from Kaggle you normally get a business problem and it's your responsibility to frame it in a data science way you have to understand the business requirements the data and find a solution to the problem there's no notion whether the problem will be a classification or regression these are things you have to work out for yourself from the business requirements given now there are senior data scientists at hand to help you through this process but I had to get used to working in this ambiguous way to deliver solutions to problems that weren't necessarily so clearly defined from the outset it does get easier over time particularly when you start developing a better business intuition of the domain that you're in coming from university to a real job I really had to adjust at the speed and accuracy that was delivering my work at at university it's okay to work for a long period of time and make mistakes that's because they don't really have any real world consequences however when your projects actually influence business decisions it's essential that you take your time and also triple check your data and your findings trust me I learned this the hard way there was an assignment that was given in my first year as a data scientist and obviously I can't go into all the details behind the work and what I was trying to do but the crux of the story was that I joined two tables incorrectly my left join was formatted wrongly and basically that led to the results I presented to be pretty much completely incorrect fair to say I was really embarrassed and now I make sure that always double check triple check even check four times my left joint inner joint anything I do that is merging two datasets together from that one experience and I'm sure if I took more time and validated my datasets that this error had been avoided but like I said I learned it the hard way so the crux of story to you is that make sure you check all your data particularly when you're merging two data frames or tables together no one is going to get annoyed to you if you deliver your work on time to a high standard it's better than simply trying to get it in early and trying to show off but get it all wrong adjusting to your first data science job will always be challenging because it's a very different dynamic to work in than full-time studying however I think you can make it easier if you take into account the things I listed in this video and try to action upon them during your learning if you want to hear more from me then I run a weekly newsletter called Dish in a Data which is all about becoming a better data scientist I'll link it in description below in case you want to check it out if you enjoy this video I want to see more videos like this on this channel then make sure you click the like and subscribe button and I'll see you in the next one