 Many people both inside and outside the tech industry often get confused with exactly what the different data roles mean and the job that they actually do. This can make it quite difficult for someone looking to break into the field to know exactly which role they should go for. So in this video I'm going to go over the key differences between a data scientist, a data engineer and a data analyst. Hopefully at the end of the video you have a better understanding of the various data roles and you'll have more of an idea of which one you best want to do. Let's get into it. Before we dive in into explaining exactly the tools and requirements you need behind those three data roles, it's really important to gain an over-understanding of how data generally flows within an organisation. This diagram on screen shows this process and obviously this is not concrete and different companies will have kind of different approaches to this but this is kind of what I've seen in my general career and from what I viewed online. So the first step is obviously collecting the data. This can be done through API calls, logging and getting information from your products or your websites. After we have collected the data we then need to store it which can be done in multiple ways like databases on cloud computing servers like S3 buckets or even basic CSV files on your local servers. Even though we've now stored the data it may not be in a nice usable format. This is why we perform transformations to make it cleaner and more accessible to the end stakeholders. After the data is nice and clean we could then do some basic analysis to gain some insight and help the business. Finally the last pinnacle of the data flow process is to optimise decision-making in the business through predictive modelling and testing. Now different roles will control different aspects of this data flow. For example data engineers are mainly responsible for the collecting and storing stage. Data analysts are mainly responsible for the analysis part and data scientists typically look after the optimisation step. I do want to stress that even though some of these roles are more suited to certain steps in this data flow process it is not a concrete barrier between where one role starts and another one ends. It's more fluid than that. A data scientist may find themselves doing analytical work now and then. Likewise a data analyst may find themselves doing more of the storing part depending on the data thereafter. It really depends on the organisation and the way the company is structured. For example at bigger companies you'll typically be a lot more of a specialist. However at smaller companies like startups you're more expected to be like a jack of all trades and you may even own the whole data flow process end to end. And there's one more key thing to be aware of which is that different companies will have different names for these data roles. So a data scientist at one company may be a data analyst at another company or vice versa. So it's really important that you read the job descriptions to really understand what you're applying for and what your responsibilities and day-to-day tasks will look like. Anyway let's now break down exactly the key requirements and tools you need for each role. As we just discussed data engineers are kind of like the foundation behind the whole data ecosystem because they're the ones who basically acquire and store the data which is needed for the analysts and data scientists to do their work. The overall goal of a data engineer is to build sustainable and robust data pipelines that serve the business which can be anyone from the data team, tech team or even stakeholders. To be a data engineer you need to have coding skills in Python and SQL and things like Java are or no SQL also help. Obviously a big part of the role is being able to store data so you need to know things about databases, data lakes, data warehouses, cloud computing and just all the various ways you can store data using CSV files, Parquet files or Delta files. Like any tech profession it's really important that you're proficient in the command line so you can edit files, execute scripts and no basic bashing ZShell scripting. A common tool used by data engineers is called ETL which stands for extract, transform and load. The most common one is a patching airflow but there are also others out there so being familiar with one of these frameworks is very useful. Finally most companies particularly tech companies nowadays use a form of cloud computing so learning something like AWS, Microsoft Azure or Google Cloud will be very handy. This is not an exhaustive list as data engineers, tools and requirements will vary between companies. Now I myself am not a data engineer so I will refer you to this article on screen here which is also linked in the description below which is a Coursera article explaining exactly how you can become a data engineer if this is the role that interests you the most. A data analyst's main role is to pull and extract meaningful insight from data to help business decision making. As an analyst you'll typically be a lot more closer to the business side of things than a data engineer so having good domain knowledge is really important. You also need to be proficient in a SQL because well it's a language of data. Things like Python also help but it's not an essential requirement particularly for entry-level roles. As well as SQL you also need to know Excel I mean Microsoft Excel needs no introduction it's pretty much used by nearly every profession out there and it's a really good analysis tool for pretty much anything you want to do. As an analyst you'll typically present your data to stakeholders and to do this you will need some sort of visualization system and the best way to do this is through dashboards using platforms like Power BI and Tableau. Depending on the structure of the organization you may be required to do some experimentation such as AB testing so it's quite important that you have decent skills and knowledge and statistics. Finally as an analyst you'll typically present your findings to stakeholders particularly non-technical stakeholders so it's really important that you have clear communication skills so that you can digestibly explain your findings and complex patterns in a really easy way. And again this list I just gave is non-exhaustive and different companies will may have different tools and skills that you need in order to be a data analyst there. Cosera has a great ask called detailing exactly how you become a data analyst if this role is what interests you the most. As a data scientist your main role is to create predictive and machine learning models to help the business make decisions. You may be doing forecasting, optimization or even deep learning depending on the organization you're in and the team you work with. Now to become a data scientist you need to have really good coding skills in Python and SQL. Python is required because many machine learning libraries are built with Python in mind and SQL is needed because you need to get and transform the data to build your models with. To build those models you need to have an understanding of machine learning and its various algorithms. I have a whole video explaining exactly how I would learn machine learning if I was starting again which I'll link on screen here. To really understand what the machine learning algorithms are doing you need to have a really good intuition behind the underlying maths and stats things like calculus linear algebra and basis theorem. Again I've also done a whole video explaining exactly the maths and stats you need for the entry level data scientists which I'll link on screen here. A lot of machine learning algorithms are deployed on cloud systems so being familiar with AWS, Azure or Google Cloud is very useful. Data scientists often work quite closely with production code so having a really good understanding of the command line through bash, z shell, editing files and running commands is also very useful. Finally as a data scientist you'll frequently communicate your results to non-technical stakeholders so having clear communication skills is a must. Now as a data scientist myself I've written loads of articles and made several videos on exactly how you go about breaking into the field. I will link these articles and videos in the description below for you to check out. So the grand old question is which data role should you pick? Well it really depends on your skills, previous experience and basically what you're most interested in. If you really like maths and stats then the data scientist role is probably best suited to you. However if you're more driven by the technical encoding aspects then a data engineer role would probably be ideal and if you really enjoy the business aspect of things then a data analyst is probably the way to go. Now matter which one you choose it's really important to remember that the roles and responsibilities will vary between companies irrespective of the title of the job. So make sure you thoroughly read the job descriptions to really understand exactly what you'll be doing day to day. If you are keen on the data scientist role then you might be interested in a newsletter that I run called Dish in the Data which is all about becoming a better data scientist and my personal experience from a practitioner in the field. It's linked in the description below in case you want to check it out. If you enjoyed this video make sure you click the like and subscribe button and I'll see you in the next one.