 Thanks, Kru, thanks, Joy, for organizing this. And thanks for all the turnout. This lot of people is interested in this topic. And you guys are very diverse, as students have tech lead, have data analysts. I think in this, today's topic is very exciting field that we should all get our hands on it and get a feel on how this can help you work. Especially for engineer and engineer, of course, we like coding and we want to have machine learning to help us solve problems. But for the data analysts and the manager and even the technical lead, I think being able to understand data, being able to analyze it, help you to make decisions, those are very good, very essential skills as well. So let's get started. Yeah, so today I'm gonna go through some best practices in ML. I'm gonna go have a running projects like sign language detector, how to illustrate all these practices. So let's get started. So in this session, you will learn best practice, how to pick a metrics, how to pick approach if you are the lead of a project and how to use your data. So the returning example of this talk is, of course, a sign language detector. So why is machine learning important here? I think for this project is quite easy for us to see because looking at this picture, if I'm a coder, I can't really use some rule to say, if this pixel is a white pixel, I would decide this is a zero or using all kinds of if-else to be able to code this. This is something that we will need understanding of this picture and being able to recognize pattern just like our brain human does. We can recognize the pattern and machine learning is doing this for us. It will need data to train it. That's the training part. So this is important for this project. But what about other project? That's not about picture vision. Like decision-making, like some of you are working on ads, marketing, those are not working on, those are not decided or by a picture, right? Like if a lot of them are decided on whether a user is in what age group whether the topic is interested to the user. So those kinds of things. For those, machine learning is also important because the more and more data coming into your website, we are in the online world, you need machine learning to understand those patterns to help you make decision. Let's go. So I would like to just go over some terminology so that everyone can be on the same page. Machine learning, I think these are the terms that we will meet in the first course of machine learning like binary classification, multi-class classification, ranking, performance tasks like drive, regression or clustering. So I always, what about our problem? What is our, couldn't go back. Is it because, okay, what about our problem? What problem is it? For us, it's definitely a classification. For each sign, we want to be able to determine a word or a word in the language. So it's a classification, but is it binary? No, it's not binary, right? There's a lot of outcome. So this one is a multi-class classification. Okay, we'll see you now. Okay, so nowadays, of course, previously, we know that some problems is supervised, some data is unsupervised. Supervised means the data has a label saying, telling us, when we're doing training, we will label the data, is it a one, is it a zero? So we'll write it down to train our algorithm. Unsupervised is when the data, when we got the data to train, we don't have a label on each example. So those are unsupervised. And these are also the terms in machine learning. I won't go over them one by one, but some popular one is the label I just used. Label means when we have a training example, we label it, this is the result we want to predict. Whether it is true or false, saying maybe it's the patient has a disease, or in my case, the label is what word is this picture? This sign language is trying to represent features. Features are your training data is different value for to help your machine learning algorithm to learn this label. So for our example, the features for us is actually the pixel of the pictures, right? And for other machine learning, it could be a user's attributes, their age, their income, et cetera. The rest I will go over when they come up in the later example. Okay, so yeah, in our example, the picture of a hand is our instance. The label is a two, one or two or some word, and we want to represent it as a vector, right? So when the two is true, we put a one there, and the others are not true. We only predict one of them is true, so this vector could be a very long vector to represent your dictionary, the vocabulary of this sign language. So when you do a hand sign, we take this hand sign picture and predicts which one is a one. So that's our output. So features is the order pixels on the image. So usually 32 by 32 is the long, the height and width of the picture. What is this three? This three is actually the three color channels of each image. Each image has three color channels, like red, green, and blue. So when we see one picture, it's actually three pictures stacked together. So to give us this color, right? And then model, later on we will come up with a lot of model on how to predict this. But some example could be random forest trained on data, metric could be accuracy, of course, we want to be able to predict as accurate. We want to predict correctly as much correct example as possible. Pipeline, let's explain later. So let's talk about the data, different methodology about machine learning. Usually when we have some data on hands, we want to divide this data for a different purpose. Why do we want to divide it? Well, of course, if we took all of these, let's say I have 100 pictures, I use all 100 of them to train my algorithm. There's nothing else I can use it to validate my performance. So usually when you do machine learning, you collect tons of data, you want to set aside some data to do validation and do testing. So there's some science behind how to divide this, but there's some rules to follow. One is, of course, don't overlap it. Make sure when you do training, those data is only for training. Don't use the training data to validate because you train your algorithm towards this training data. If you're gonna perform well on these training data, but you want your algorithm to perform well on data that it hasn't seen before in the real world, where some new pictures coming in, how will your model perform? So that's why you want to set a size of data that don't use it for training. No overlap. And validation and test sets needs to be real, of course. We wanted to represent the real data that we want to perform well in the real world. And make sure they come from the same distribution. This one is actually quite hard to understand why, but let me explain. It's not that hard for the test data and validation. Validation is like if you are an archer, your algorithm is an archer. You're training it to aim towards a goal. You want to hit the arrow at the center of the goal. So validation set is where you set this goal to make your algorithm to shoot for the center. So if your validation set and the test set are totally different, they're not in the same distribution. They don't represent the same problem. Like if I train my sign detector on all the picture using hand sign, and then on the test set, it's not hand sign anymore. Maybe using some pen or some other object to make a sign. Your model is not going to perform well. What's the problem here? The problem is you are not training your data towards your real goal. So make sure your data validation set is matched to your real goal. So if your real goal is one to predict on hand sign, train it on hand, collect data, and train it and validate it on hand sign. So that's what it said. And no decision based on test set. Test set is you want to be able to predict how your model perform on data that it hasn't seen before, never seen before. Like when you go to production, it go in the real world, it will make pictures that it never seen before. So you don't want to set aside this test set. You only wanted to use it to measure how well your model performing the real world. That's what it said. So this is a little flow diagram on how a project, how algorithm usually perform the life cycle of it. So we have training examples. We use it to train a model. Once we have to model, remember we have a validation example. We use, that's not useful training, but we use it to perform some predictions. And we will see how well it perform on some data that it hasn't trained on before. It was, for example, accuracy. It was accuracy right now. So we can see that if something is not up to standard, something is not as expected, we can iterate on the model. Decide on, do I need some other new features? Do I need to make the architecture more complex? And sometimes we might need to collect more data. That could fix the problem. So this little circuit here, we'll go through multiple times until we find a model that we think is performing well on this validation examples. And we make a decision on the model. We pick the best we can find. And then we want to see how it works on data that it hasn't seen before. That's the test examples come in. We never use this before. So we use it to predict. Use it for prediction. And we'll see a test metric that represents how well a model will work on the real world when we see data that it never trained on or validated on before. And once your model goes to production, we'll keep predicting on real-world data. Maybe for my project, if I build a camera using my algorithm, my camera could be set on a door. Maybe I sit on a room and someone will come in and do a hand gesture. And I will predict what sign language is it and try to record it. Record is what they are trying to say. Oh, so when in production, make sure our model is not finished yet, right? When we deploy to production, why is that it's not finished? What else can be changed? Well, what can be changed is your data could be changing because your application sits in the real world. Maybe in an app, maybe in a camera, in my case, in a room, but the data could be shifting. Maybe your user is growing up, is no longer a teenager. Maybe your app is now used by some other demographic in some new use cases that you hasn't expected it for. So what you trained on before is no longer fresh, no longer current to your new production data. Make sure you monitor this drift of performance. If your performance is starting to decay, that's when you might need to collect new data and do new training to iterate again. So that's what it said. It evaluates metric. So this one could, it looks a little bit, little bit complex on a lot of equations, but this one is more, I think in the university, we usually, let's say, around true, say a patient, right? If a patient has a disease, it's true. If they doesn't, it's false. And our algorithm will predict whether they have a disease, true and false. Hold on a second, let me bring my thing to a cursor. Okay, that will be easier for you to see my cursor. So I will predict prediction can be true and false. And this is called confusion matrix. We will divide it when something true, that's true positive and we predict it true. That means when someone has a disease and our algorithm predicts it true, true correctly. And the other is using the same rule to divide them. So there's some common metrics, accuracy. Accuracy means out of all of these predictions we did, which one we did it correctly. We did it correctly on the true positive and true negative. So true positive, true negative, divided by the whole, all four boxes. So that's accuracy. Precision and recall. Precision is when we do prediction, we predict true. How many of them did we predict it correctly? So it's on this row, first row. Recall is when some patient is indeed have this disease, how much did we predict? How much did we find out? See on this column. Okay, there's a lot, there's also a very common metrics called F1 score. F1 score, I will explain why this is come up. So right later, maybe later. The other is area under the rocker. This is a rocker. You can see it on the y-axis is the recall. On the x-axis is the false positive rate. So we want to have as much as recall as possible, larger the recall as possible and as little as false positive as possible. So we want to shoot for this corner, actually. We want to shoot for this corner. So if our curve is, one curve is like this, the other curve is higher like this, the second model will be better. That's why the area under curve measure this. The more the area under this curve, the better your model. That's another metric. So choose metric. So make sure sometimes we got very, we wanted our product to be perfect. We could choose a lot of metric, but that's not a very efficient way because you are a startup, you want to iterate on your model as fast as possible. Pick a single metric that will help you to optimize, help you to iterate faster. But what if I do have multiple metrics that I care about? Try to combine it. F1 score, like I mentioned, is actually the, F1 score is the average of precision and recall. Precision, I think precision is this row. Recall is this column. So it's the average between them. Well, one strange thing is why is F1 score don't care about true negative, right? Why it got excluded. True negative means back to the patient example. If it's a rare disease, like cancer, it's not gonna have a lot of people with cancer. So this falls is gonna be big, a lot of people here. And true negative means your algorithm correctly predicts someone doesn't have a cancer. It's gonna be a very large number. It's gonna scale your performance. You want to ignore it because your algorithm doesn't care. You want your algorithm to be able to predict people that has disease, truly has disease. So F1 score is trying to ignore this part. Weight metrics from different demographics. That one is also very commonly used. Sometimes your metrics is not uniform. You care about more cases than others. Like some demographics is more important to your business. You give it a higher weight. So that's what you say. Define a constraint. You can define some constraints even though we picked a single metric. Say I want to maximize the recall, but I want to maximize the recall, but I want to make sure my position is larger than 0.95. Another one is maximize my area under curve, but make sure the running time. When we do prediction, make sure running time doesn't take too long. Like one seconds, it could be another one. So for me, this project is quite, is doesn't have that much complexity in the business. I can pick a accuracy as my metric, but I do have some constraints. I don't, when I do the prediction, I don't want my model because I want it to be useful in a camera setting. So I want to be able to predict a sign in less than five seconds. That's a constraint. So this is basically what I picked. So now back to, I would like to help, I would like to illustrate how do you do a project. Imagine you are a CEO of a sign language detection camera project. You can, you, so for this project, we just focus on the algorithm. We don't focus on the hardware of the camera. And what will you lead your team to do first? You can be, so imagine you're a startup that's working on this project. Your first decision, how do you start this project? Is it about coming up with model coming up? Is it about coming up with data? So let's find out. So for the audience, please, I'll give maybe five minutes to input your answer. And when we come back, we will discuss together. Okay, so for those of you who are not familiar, you can go to menti.com on your browser, right? And then just see in the seven numbers in and then you can answer the question. Yeah. Don't have a price for you, but do participate in the discussion. Yeah. It would be great if you have the price. Join next time, I must prepare price up. Okay, so I'll give you guys some time. So meanwhile, while we're populating this, don't think maybe can you answer one or two questions then? Sure. So, okay, so one of them is asking, usually when we need to check the ML model drift? Usually, we would want to be able to do it continuously. That means we want, when our model in the wild, we want to be able to log some data about it. We want to be able to monitor it. So it's a regular continuous monitor process. It's not that we decide on how many days we do it. Okay, good. So I think that's, Bundyong is asking how does it work? Is it sign language in a sequence? Right, so I think you mentioned already just now, it's a camera setting, where the camera will be looking at the different signs and then have to sort of make the prediction within the half second, right? Is that correct? Yeah, within a half second, yeah. So it could, so let's simplify it for now. Let's say your training will just be taking one picture and be able to predict the sign, the single word it represents in 0.5 seconds. Okay, so that means when you're doing the training, it's all static images, pictures of all the different signs and all, right? But once, when you implement it, it will be actually through a camera video. Yeah. Okay. So what is it? Sorry, Bundyong? So like some sign language is one, maybe two sequence is a sort of a word, or maybe three sequence is a sentence, and it doesn't work that way. It's basically just using sign language to form words, like ABCD, like there, is it? Oh, I think I... Is the sign language like trying to create the words or is it the type where they can actually communicate? I think a sign language usually, if it's a different sign, it will represent a word, and then you put them together and they will form a... So not that type where like maybe there's a piece sign, it's a piece or something. Yeah, something like that. Moving one shoulder from the other is a certain word, like that. No, not that type. Well, not that type, because some... Wow. Yeah. That wouldn't like be a very long... Oh, sorry, sorry. I think usually the sign language is trying to be as efficient as possible, right? Yeah, yeah. You don't want to do it in a little character. You want to be representing objects. So several words, maybe some meanings into a sign. I think that's what Bodong is mentioning. Yeah. So it's not like creating sentences like A, B, C, D, E, right? Is it A, B, C, D, E, or... No, no, no. Is it like trying... It's not A, B, C, D, E. It could be an apple or orange. Oh, but sign language normally there's a sequence, right? Like maybe three sequence is one word like that. So... But this one is just one picture, one word like that. Is it? Yeah, one picture, one word, yeah. Oh, okay. What I would like to say is for this presentation, because it's just to illustrate the best practice, I'm trying to take a project that simplifies things for us to illustrate the key points. Oh, sorry. Okay. Bodong, your question? Okay. One word by one word, yeah. Yeah, so one word by one word depends. So because if you look at the training data, it's actually one image at a time, right? So yeah. So it'll be one... So yeah, I think... I think sometimes you also understand that there's limitation to it, but I think this is a project that just sort of like starting out. So that's why I thought also something that is quite interesting to share with the community. Yeah. Yeah. Yeah. Let's fill it on top of this and see if we can improve. Yeah, exactly. Okay. Let's do one more question. Then we'll go back to the manti meter then. How to evaluate multi-label models? Hmm. Yeah, I think we take this and we'll go back to the manti meter then. How to evaluate the multi-label model? Multi-label. Is it like multi-class classification? I think you'll be multi-class, yes. You'll be multi-class. Okay. Multi-class classification. So in this case, we'll pick a... We'll evaluate. The metric we pick is definitely accuracy. We want to make sure the sign do this. We predict the one. If we predict anything else, or any other labels, that would be a wrong prediction for us. So we just count how much we predict correctly out of all the other predictions. That's the accuracy. Okay. It should be accuracy. I think to Davos, question how to evaluate multi-label model will still be accuracy. So as long as what is predicted is equal to the actual label, then that is considered accurate. Anything else would be considered as false. Yeah. Okay. All right. Okay, let's go back to the metameter then. Okay. You can continue. Thanks. Thanks. So we have a lot of... A lot of answers. Let me get some time. Take a look. Define the problem like there's a... Explore also data types. Okay. So what do you use cases and business values? That's pretty good. So yeah, because you are the CEO, you want to make sure you know what... When you know what to aim towards, you want to know the business values of your model. That's correct. That's definitely a good... A good discussion to have with your team. And mass data, yeah. Create data, collections, and notate them. Yeah. Yeah. Yeah. Yeah. A lot of them are about data. Yeah. I would like to be able to... I would like to give you some tips. So when we do... When we start a project, make sure the... Of course, we want to know what the success of this project look like. And also, we want to do some research on existing work, right? The problem you're solving, sometimes it's already tackled by some other team, or maybe in some literature, some paper, maybe some open source solutions. So do search for that. Search for that before you stop. Search for that and make that as your to-do list to understand those. And the second part is definitely about the data. Before that, I also want you to maybe take a little step to visualize what the input and output of your algorithm look like. So for us, we want to... Already, we have some discussion. What is the input? Is it an image? Is it a sequence of words? Sequence of video? Only seconds? Those are your input. So make sure to visualize what those look like. And that will help you later on... That will help you to collect the data you want. You don't want to collect all the images when later on you decide it. You don't need images. You actually need something else. So that part needs to have a discussion with your team. Yeah, so let's get back to work. So a lot of you are talking about collecting data, right? So we have some discussion. Make sure to visualize what your data look like. But one thing is when you start spending time to collect the data, how long do you spend your time on? Should you collect as much data as possible so that your algorithm can train on as much as data as possible? Definitely not. You don't want to spend a whole month collecting data and later on decide the data is useless. So when you start starting up, collect some data to be able to get your algorithm to train your first version of algorithm. And also have a process of once you get the data you collected, try to explore these data. Try to use a tool to take a look at these data, visualize their distribution. Later on I will show you that. Some tools are on doing that. And also nowadays we have a lot of ways of collecting data. In my examples, I can go online and download some picture of a hand sign, right? In other problems, maybe it might be not as practical or feasible to collect the data online. You might need to collect these data yourself. You might need to go to Amazon, the mechanical term, to hire people to generate these data. But for the advice to stand the same. Make sure you collect these data as fast as possible so that you have your first version. Sorry guys, I have someone knocking on my door. Give me two seconds. I'll be back. I think some of you are asking whether there will be a video link and all this. I will chat with Engineer SG, but should be Engineer SG's YouTube channel. Then I will post a link in the Facebook page. So just look out for the video. Thanks Michael. Maybe Michael you can share the link with me then or maybe just go over to our Facebook page and share it as well. We'll be waiting for Zhang Qi. Paul, are you still there? Hello, so sorry. No worries, come. So yeah, yeah, same. Later on I will show you some tools to explore your data. So when you collect the data, don't just trust it. Make sure to look at the data, see if the label, see if everything is what you need. Yeah, this is exploring the data. Okay. So explore the data. So there's, I want to show you one tool today is the what if tool. This tool is helping you to visualize the data without writing a little code as possible. It might not be very easy for new machine learning people, but that's why I want to show you. So get you a little bit of a taste of this tool. So when you're starting your first, when you're still getting around Python or Jupyter notebook, you don't want to worry about this yet, but later on when you are working with data quite a bit, you want to have some way to explore this data and writing as little code as possible. So let me show you. The interface is, this one is like an example is classifying the predicting predict the salary of some, some, some people based on their, their features. So let's see the features include age, education, hours per week, Merida, Merida status and their race and relationship and sex. So in this, this demonstration, I think it has two, two model competing, model one and model two. So right now we are looking at the label of model one. The, I think the, the, the blue is we predict the income is less than 50 K the red one we predict more than 50 K. So you can change, you can look at what model two is will be predicting you can see the difference. And you will be able to, this one is the best one. I think you don't need to even come up with model and you can start looking at the distribution of different features like age, capital gain. This one is the label you want to predict. But already we see some problem in this one. Right. Remember, this is our label with the target we want to predict, but it's very skill. A lot of people is lots of people is zero means zero means we, lots of people is under 50 K only a few people is over 50 K. So when we're picking a metric, if we only take accuracy, like in my hand sign prediction, sign prediction project, it's not going to be fitting for this project, because algorithm can predict everyone. Everyone should be less than earning less than 50 K and they, the algorithm model will be have a very high accuracy already. We want for this kind of data, you might want to be to handle this skill, skill distribution and or pick a different metric. Yeah, a lot of distribution and this, this one is also very interesting. Let's see. This page slides by, let's say I want to slice by sex. It will have to a male and female here. Female here. What do we want to see here. So we want to see, we divide our data into male and female, and we see the two models trying to predict them some predict. So a threshold is point five. So when we both of them is point five, when one model or one month when we went with when the score is higher than point five we predict. It has a predict them has a has a what has a has over 50 K. So, if in here, when we drag this. We can see the predicted number changing. So let's explore one thing I want very interesting this is male and male income versus female income right. If we want if we want our algorithm to be fair to both male and female. We predict actual, let's say this number we predict 280% 80% when we when we have a special like this, we predict 80% 88% that's predict no. And we if we want to match that for female. So what we did right now is 90 something 90 something. We actually want to, we're many to job this 80 something to to be matching matching the prediction number of these two algorithm. We see the threshold for for male and female step different. This is that this is a very set truth is the data actually telling us that the, our for for male for algorithm to predict to predict and have a have a higher income is actually has a higher threshold So that's what this graph is about. And also you can you can add more than the most interesting thing is you can try to modify this data. Say switching their switching their hours per week or switching the education to see if your algorithm will be we change your decision of your algorithm. So that's it. This too is. It takes some. Yeah, it's very interesting. It will take you some time to explore. Later on, when after the after this talk, I hope you you will find some demo and take a look at your play around. The other one is tensor flow playground. This one help you to help you to get familiar with TensorFlow's TensorFlow's players. So for us, let's get back to our project. Once they get the data, our data is some some pictures of a hand. Maybe I took it myself. Maybe in a room in some in some good lighting, good lighting. So we come up with the first model for your first model makes. Like I said, make sure you want to be able to go move fast. You don't understand a lot of time building your first model and then in the final is not performing well. We want to be able to move fast, collect data fast, build your first model fast and iterate fast. Why is it so important to move fast to on these simplest on on these first few steps. Let's go this way. When you move fast on your first model, move fast on collecting data. You are not just maybe saving you a few days. It's actually think of treated like helping you to move two times faster. Right when when your project move two times faster, when on everything you do, you are, you are very competitive in the market. Or in your in your job as well versus other competitor. So first model I only I do both basic thing for image. Just for neurons, just this is one layers and output for me here I only draw two outputs, but the output is equals to the number of words in your vocabulary of my of my sign detection. So this is the simplest thing I come up with. For other project, like classification of, of whether, whether income over 50k, you can even be simpler, even a logistic regression could be worth as you could work as your first model, it could perform quite well already. You have a baseline already. So one fully collected layer. So, once you have this model, right once you have this model and you start training it and you start validating it. You will you will see the performance here on the left is is a learning curve. Learning curve is plotting the error of my algorithm. The bottom one is the training data. The top curve is the validation, validation data. And on the bottom is the number at the size of my training set. So why is the curve look like this. Yeah, imagine when my training when my training is just one example, just one picture. My algorithm is very likely to be able to remember remember this. Remember this picture and do a petition. So the error for this training. After training is very low, but it's not going to perform well on data. It hasn't seen before, like I would like my validation set. So only when my day, my data is increasing is harder and harder to model all these different pictures, different kinds of hand sign different kinds of lightning. So the error on my training set, I will start seeing it go up because it's harder to represent all of them. But because it gains all these, all these training, my model is having it being able to represent more and more complex function. On data that it hasn't seen before. The error will drop. So the curve we will usually meet at some point. That's what this graph is trying to say. But this graph is actually not telling us a problem of my model is telling me that after some training, I have a big gap between my The error on my train training training set and the error of my validation set. How do I how this office. This is telling me telling me a tell me a thing telling me a thing that my training set has high variance. High variance basically means my algorithm is able to have a low reading error. It can represent very well of my training set. It may be with everything. Maybe it fits them very well. But on data that it hasn't seen before doesn't don't work that good. In that case means I don't have enough data actually. I could I should go out and collect more data. Or some other techniques like for hand sign I could augment the for pictures I can augment the picture usually augmenting commenting big picture means cropping them on different spots. Your hands could be on different region of the picture, different region of your picture. So that's augmentation, but make sure one thing I want to make sure is when you do augmentation, the sign sign language sometimes assigned on left and right. The sign language is different. So when we do augmentation, we don't flip the picture, maybe top and up and down meaning totally different words. So we we use this kinds of use this kinds of metric learning curve era or or the metric accuracy metric or the rocker to feedback as a feedback to improve on our model. Our first model actually one one layers of model is not very complex. So it couldn't quite represent our data. So the second model is we come up with this more complex. We tried to have more layers now, more layers. This is this one convolution that convolution layers. If you're interested, we can discuss later, but this one works for image, but for other problems you might have a simpler layers like fully collected layer. And we stack them together at this times to the more layer you have the more complex your model is is able to model some complex things. I can't sign. And in our case, this one is actually some people previously asked, what is how do you do with multi multi labeling, right? The rough max is multi labeling, it will output a vector of probability, the highest probability vector probably 123 on different positions. The highest probability is the one we predict. Say the, say the fourth elements we have a point nine probability. That means we predict this point nine chances that this sign is a four. So that's what it is. We did very good on training accuracy because our, our model is more complex is now a point nine now. It also improves on test accuracy as well. It's not as good as training, but it gets a point seven eight. Yeah, this is more complex model have us to increase the accuracy on the data it hasn't seen before. But still not very, this is a very high variance problem. We don't have a lot of, we don't have a lot of error on training but we have a lot of error on test. That means we might need to be, we might need to introduce some regularization into my model is to to to overfit through my data. And then the next one we iterated on to introduce to introduce all the regularization. Don't worry about about all these details these details very see very computer vision specific, but I will point out that some layers that this batch normalization. All of these has as normalization in it that introduce introduce regularization make sure my model don't overfit the data that much. And this model is you can see this has a lot of layers in it each of these blocks as private layers and it helped me to help me to improve on the test accuracy a lot from 78 to 80 point 8.6. But since there's a lot of layers, the training takes a lot of time. I actually training on on CPU. If I on my computer only CPU and I did to train train like if I train it for 40. Oh, sorry to what's the timing. Do I need to speed up. Another 10 minutes and then we can take the questions yeah. Another 10 minutes to take it. Okay, I think I can do it. Yeah, the three, the training actually need to do on the seat on the GP on GPU on GPU I can train the fact much faster than CPU. And on but when we do prediction training we take a lot of time but on to when we do prediction it doesn't take a lot of time to fit my criteria. So let's get get once you have your your model you you you want to test it you want to launch it in production. So make sure you you just launch it on some fraction of user first and do some maybe testing. This is part where advertising company I usually do when they have a new algorithm on predicting ads ads click through rate that how how how what's how likely a user will click on the ads. So you you do a testing. So once you're in production, like I said, monitor the performance regularly continuously. When you see some drift. Yeah, regularly, we see some drift means create some tasks for yourself to update your model or collect the new data and iterate. So all of this I will let you know in Facebook how we do it. You don't have to worry too much if you are a startup right when you start up, you can write your algorithm. Like we discussed, have your algorithm have some data to be able to run that that's good enough. But I would suggest you when you start up starting invest, investigate the tools around the ecosystem, like the data exploration tool. I show you that will help you speed up your help you give insights on the data. And for the infrastructure, also spend some time to take a look around the market because all of these help you iterate faster, help you to stay competitive in the market. Monitoring is definitely one be able to monitor the predict how well you're your algorithm predicting in the real world data. Make sure when the drift happen, you have a to do. So in Facebook, we do have this kinds of, of course, in the scale of Facebook, we need this infrastructure. We have some we have a whole pipeline to train a model. We all of these are a model each row is a model. So for ML engineer, we can deploy a model multiple actually we can deploy multiple model and pick the one that has best performance. So we don't have to wait on each one. Once the one we picked is the good model. This one is actually a tree decision tree. You can see, once we pick the right model, we can, we can deploy it to reward just with just within the same same same pipeline. So this picture you see it before this is the train. This is a test training validation picture. I show you before. In Facebook, we have the infrastructure to try to map all of these into a same workflow, same application. I can show you from the feature from data to feature computation. This part is data. Sorry, data. What, what is the difference between data and feature data is, of course, is the data in the database features is is ready for to you for using machine learning in our in our work. So one say, say, our database is already have all the other pictures of hand. One of us could one of the employees could already generated some computer the features for these images. The other one, the other employees that want to work on the same problem. He doesn't need to compute these features anymore. He can share this week. Look at here. We can have a tools. Everyone can share the features they computed. They can use it together. Training the previous slide show you the training screen. This is the evaluation and evaluation and inference is the evaluation is on training taking a real good model inferences. When we deploy in the production we make reduction with it. So in this area, because this is Facebook, Facebook's infrastructure, but I would suggest people not in Facebook, investigating some time on a, some other other true, true set, like pytorch, that's open source that they have a API to be with this kind of pipeline. Look at, take a look at snowflake and Amazon's machine learning stack. So it's investigate some time on the tools that will benefit you later. So for us to take away is make sure to explore the data when you collect it make sure the visualized what kind of data you need. Explore it. So, and also do some research, do some search online, look for open source project that you can utilize to kickstart your project. Choose a one metric to optimize to work and iterate quickly. And as the monitor the performance, keep the model correct. Yeah, so that's, that's it for this presentation. So I can take some question. I do have a slide for people to answer questions, but you can put it in the chat as well. Okay, I think that's good with the chat first. So maybe let's just do the next one that we may have missed out and that is what are the tools that that you use I think the one we covered just now. So actually what tools are you using to check for data drift. Model drift, not not data, but the model drift. Yeah. What screws do we use for data drift right for performance trip. We will on when once we deploy we monitor, we monitor the metric we care about like accuracy in my project. You can monitor you can decide on metrics on you want to monitor. Once that a good rule of thumb is once that once that value is starting to go down trying to go down that means your model is getting outdated. The other thing is, remember I show you the tool that explore data. Yeah, the what it to at the beginning of the beginning you if you are you know you're trading on data that's with this with this distribution on age. And when they're on when you monitor online, you can also monitor you can also not only monitor the output and metric. You also monitor the data is is is predicting on you can also monitor the distribution. Now your app is no longer no longer used by teams anymore is now using more more by people in the 30 or 40. So that's a good in your data. Definitely a trip. So your model needs to be is it be updated. Yeah. Okay. David, I think your, your question is answer is a single algorithm or multiple. So it's just one single CNN model. There are methods to automatically recalibrate the model using data collected from users. So Francis over here are there methods that you use right to automatically recalibrate the model with the new data collected from the user. That means your model does your model actually have the ability to evolve over time or you have to sort of retrain rather. Yeah, I think about the retrain right this this part is a is having it all hard. If you you can set up some automation to monitor the performance, give it a threshold, say it drops several several points one point accuracy already, you can automate this retrain process. But usually, usually, we would, we would, this, we would, we would try to investigate a little bit, instead of, instead of letting a system recalibrate itself. That's up to definitely up to the team. Yeah. Okay. So do you do any foreground foreground detection or background removal for images. This part form for this project that this project know, because the image is quite small. And we are quite, we are quite confident that the neural network is more complex than than a 30 by 32 picture. But you have you bring out a good point is some pictures, like very large picture very complex picture like object detection for self driving. Sometimes for those kind of problems, you, for me, I can, I can take picture of my hands, thousands of times in very short time, but for self driving car. I'm very likely you have a lot of picture from from cars on different roads, right. So that those kinds of problem, you don't have enough data. Make sure so at that point at that time, you might divide your problem into several, several phases like first detecting a car detecting a phase, put a bounding box, remove everything. And then the second, second, second algorithm network, try to detect what this, what this bounding box is telling me is maybe a soft sign, maybe a car something. For me, this doesn't need to crop any picture. Okay, so, so, so you actually all your pictures you never crop or so like rotate or maybe do some do some changes. So, yeah, because I have enough data, I don't, I don't need to do the augmentation. But if I don't have enough data, I might need to start cropping it to create more testing data. Okay. Let's see. Okay, maybe we just take one more question in view of time, given the time that we have. Okay, I think Daryl asked this question and that is, will more layers actually always result in overfitting. Based on your experience, does that, does that really, is that the case? Nah. So, nowadays, especially the days we are in, definitely, we have tons of data. We have more data than we need sometimes in the online and online company and in the connected world. So, more and more, the work right now is creating deeper and deeper layers. How do we, how do they combat this deep layer? Usually deep layer mean more complex. It can represent more complex thing. So, you could overfit your data. They just grab, change, put more data for training. That's how they combat this. So, your question is right. The deeper the layer, you will be overfitting your existing data, but you can increase your existing data. Okay, so increase, I will say increase more data to sort of combat the overfitting, like, even though you increase the number of layers. Yeah. Okay. Okay, I didn't ask question already. So, David is asking, like, can you share a bit more to the, maybe to the audiences, right? When we talk about the training data set, how many pictures are there? How many pictures for each class are there also? Maybe you can share a bit more of the data. I think I might, I might not remember the exact class of how many on each class, but for me, the picture is around, around 60,000, 60,000 picture. 60,000. Roughly, roughly how many class, how many class? For me, the class, the class for me is, because this is an illustration project, I can decide on the class. So, I just pick the, in the library, in the dictionary, I picked a thousand of them. Okay, so a thousand, thousand classes, but 60 or a thousand, okay, can. Okay, I think that's about it already for today. So, for the rest of you, thank you, thank you very much for attending. Please, thanks. I would like to also thank Zhang Tee for speaking at our Data Science SG events. Thank you very much, Zhang Tee. Thank you, and thank you, Joy, for arranging. Thanks, cool. Okay, thank you. Yeah, so, unfortunately, on a Zoom meetup, we can't clap to show our appreciation, but yeah, I'll just clap on behalf of the rest of the people. So, thank you very much, Zhang Tee, for speaking. So, for the rest side, thank you very much for, oh yeah, David, yeah, you can actually put a clap on the reaction side, on the reaction side. Okay, thank you, Kamar. Yes, thanks everyone. Okay, so, with that being said, right, I think that's all ready for today. So, do look out on our Facebook page for the link to the video. I think Michael will share the link and also do look out for our next event in November. Okay, so with that being said, thank you very much guys, guys and girls. Thank you for attending. Thank you. Bye bye. Good night.