 The job of a data scientist is to make sense of information. What does data tell? Are there any patterns? And more importantly, can it help us forecast the future? But what if we need to predict the future every day, every hour, or minute, and do that for thousands of people simultaneously? Say, predict the taxi arrival time. There's one specific role that builds a bridge between data science and its practical counterpart, machine learning. Meet the machine learning engineer. Well, the role of machine learning engineer is to use machine learning to somehow bring additional value to the business or to the product. This is Oleksandr Kondiforov, a data science competence lead at LTCHSoft. In most cases, it means that we are building some features for products or automate some workflows. So, for example, like building decision support systems or full-automated decision automation systems. The key word here is product. A machine learning engineer always focuses on shipping a working piece of software. And this product uses machine learning. Sounds simple, but how does that translate into practice? Let's imagine we have a product team that builds a ride-hailing app. What we want is an algorithm that will accurately predict pickup time for the customer. We can calculate pickup time based on distance and average time without machine learning, using a simple rule-based system. But there are plenty of variables that may skew the results. Brain falls or blizzards, traffic congestion, and road incidents all affect the arrival time. With a rule-based system, a software engineer would have to consider all possible factors and write code for them. There are so many of those, and there's no way to write rules for everything. Then how can an ML engineer help? They can build a model that learns all the possible relations between data by itself and then gives us a more accurate prediction if we support it with the necessary data. That said, let's talk about ML engineer's responsibilities. So an ML engineer will start with choosing and preparing data. Let's assume that there are several variables we need to calculate pickup time. The distance from the customer to the driver, speed, weather, and traffic congestion to name a few. All of these can become features. Data attributes a model uses to give us prediction results. To get this data, an ML engineer will have to analyze historical records on previous pickups that contain those variables. Choosing the right data and consolidating it is the first step in preparation. Then the ML engineer would clean the errors from the data, fill in the missing entries, and transform records into a single format. Once the data is ready, an ML engineer needs to choose an algorithm that would fit the task. The choice depends on the type of data, expected predictive accuracy, and how resource intensive the model is. You may need deep neural networks to process images and videos with 98% accuracy, but training them would require renting clusters of GPUs. And running those models in production may require specific AI optimized processing units. But sometimes good old decision trees would be enough. The ML engineer would experiment with several models and a subset of data to find the one that fits the task to start with model training. During the training process, a model will learn to make predictions by finding patterns in the training data set. You also need a testing set of historical data to evaluate whether the model gives accurate forecasts. If it passes the test, congratulations! We have a model that can make predictions. But the model isn't a part of our product and our customers can't use it yet. So now an ML engineer comes to productionalizing the model and its deployment. Here is our taxi application. Or in this case, two client applications used by drivers and customers and our server where all the backend logic sits. Now we need to deploy the model. Machine learning models are usually deployed as a microservice, an isolated container where the code has all the dependencies and can perform as a standalone unit. So an ML engineer wraps up the model into a container and deploys it on the server. Then he or she needs to connect the model to data sources. The applications will handle some part of the data like driver and customer geolocation, current speed of the car, and so on. We'll also need extra data like traffic incidents, jams, or weather that comes from a separate database. From this point, the model can consume the required data, calculate a prediction, and send it back to the customer. But here is another problem. Remember, we tested the model on historical data. But how well does it work in real-life conditions? You need to track its performance. And this is one of the main concerns of a machine learning engineer. Model performance monitoring and evaluation. Let's say the model predicted a taxi would arrive in 14 minutes while it actually took 20 minutes. To capture this, an ML engineer would set up monitoring infrastructure to compare real-world data to the model's predictions to understand its accuracy and how it changes over time. Monitoring systems provide ML engineers with necessary data to make a decision whether the model performs well and if it needs retraining. So what is that? As world conditions are changing, the model can require new data. Say, a large part of a major city highway was closed for reconstruction, which made drivers reach their destinations later. The model started predicting a pickup time less accurately because it was trained on outdated data. And if the ML engineer has monitoring systems set right, they will show this drift. Such changes are a prerequisite for training a new model with fresh data. Since the world conditions may change daily, retraining often becomes a daily task for a machine learning engineer, so it makes sense to automate this process. As you can see, the ML engineer is generally responsible for, well, the whole machine learning part of the product, starting from data analysis to the moment the model is trained and launched in production. So what would the typical background and skill set of an ML engineer look like? First, its statistics, data analysis, and applied mathematics. As ML engineers curate features and prepare data, the fundamentals are critical. As you've probably guessed, these specialists must also know existing machine learning algorithms and common architectures. Decision trees, support vector machines, naive bays, deep learning networks are a few popular algorithms used in ML applications. To train those models, engineers have to be familiar with common tools. Python is the main programming language used in data science. ML engineers may also be proficient in R to explore and visualize data. Similar to software engineering, ML has a number of frameworks and libraries that specialists use to streamline their work. One of the main ones is Scikit-learn, which is a Python-based library featuring a variety of machine learning algorithms. As deep learning becomes a universal answer to any ML problem, it has its own library, TensorFlow. But what about skills required for production engineering? Normally, ML engineers are required to know high-performance languages like Java and C++ to run models on the server. If they work with big data architectures, ML engineers must be familiar with distributed computing frameworks like Hadoop and data processing tools like Apache Spark. And if the product actively uses deep learning, the engineer may need to know how to configure parallel GPU computing platforms such as NVIDIA CUDA. So where do machine learning engineers come from? Obviously, you'd expect them to have a computer science education. Some engineers transition from software development while others start with data science and analytics and then acquire engineering skills. But this set of skills sounds like a data scientist, right? Then what's the difference between them and when specifically should you hire an ML engineer? Data scientists and machine learning engineers are quite common. And in fact, in many companies, these titles are used equally. And it's actually up to the company whether to name their specialists to be data scientists or machine learning engineers. For example, data scientists might not actually use machine learning to do their everyday job. So for example, they can be doing some analytics, data analytics, or EB testing, or apply algorithms and statistics to data. In other words, data scientists don't necessarily work directly with production-alized machine learning models. Sometimes they only focus on analytical tasks. For instance, our ride-hailing company may employ data scientists besides hiring ML engineers to explore new markets and to find the viability of expanding there. At the same time, machine learning engineers tend to be more engineer savvy. In most cases, probably they build some kind of machine learning based features for products like in, I don't know, Google or Netflix recommendations or search. Also, machine learning engineers might be, it might be easier for them to actually production-alize their models, the results of their work integrated with other parts of the system. So the production part is what can draw the line between data scientists and machine learning engineers. The latter definitely train, launch, and maintain ML models. Data scientists may not do that. What about data engineers? The responsibilities of a machine learning engineer will also overlap with that of a data engineer, a specific tech professional that focuses on transferring data from one system to another, managing databases and working with data transformation tools. So data engineers are more a closer to software engineers. So they obviously work with data, they build data pipelines, some streaming, processing, caching, whatever. It's not required from them to actually know machine learning. ML engineers and data engineers often cooperate in running data infrastructures that support machine learning. Back to our example. An ML engineer is likely to define specifications for a database to keep information on traffic incidents, jams, or weather. In turn, data engineers can use these specifications to upload data to a database and connect it with the model. And there you have it. If you're aiming at running machine learning models in production, you're looking for an ML engineer. To set up data infrastructure and databases, you'd look for a data engineer. And if you need deep data research and analysis without necessarily running machine learning, you should consider data scientists. Of course, it's hard to draw clear lines to separate these three roles, but this distinction should make things a bit easier for you. To learn more, watch our videos on data science teams and data engineers. Thank you for watching and stay tuned.