 Hi everyone. Thanks for joining. I'm Kellan Betz, a course lead in the MITx MicroMasters and SCM program here at MIT Center for Transportation and Logistics. I'm co-hosting today's live event with my colleague Laura Alegha, also a course lead in the MicroMasters program. Today we're very fortunate to have Dr. Sergio Capiero with us, a senior data scientist at Amazon. Welcome, Sergio. Hi, Kellan. Hi, Laura. Thank you for having me. It's always a pressure to participate in live events and be in touch with the MicroMasters community. Awesome. We're very fortunate to have you. Before we kick things off, we wanted to kick the event off with a poll for those of you who've been to our events before. We like to start things off with a poll, and so if we could launch our first poll here. Awesome. And so we want to learn a little bit more about our audience today. Why are you here today? I'm a few options here. I'll just go through a couple of them. I want to learn about spy chain data and analytics in general. I'm interested in knowing more about data preparation processes and strategies. I want to know more about using data from machine learning. There's lots of interesting topics. Hopefully we'll cover many of these, but we wanted to give you a chance there for a few minutes or so to fill out that poll and why I do. I'll pass it to my colleague, Laura, who will go through a brief agenda for today's session. Thank you, Kellan. Welcome, Sergio, and welcome everyone in the audience. During the next 15 minutes, Sergio will talk about the importance of data preparation to train accurate machine learning models. We will also discuss about the challenges typically encountered when preparing data, and then we will share some best practices that may help you. Kellan and I will then ask a bunch of questions we have prepared for him, and we will save time at the very end for your questions. Please make sure you use the webinar Q&A feature to ask the questions, and be sure to be logged with the name because we will not read any anonymous questions. So just be ready to participate. We want to make this a very interactive event. And with that, I guess it's good timing for going to the poll results. So it seems Sergio that most people here want to learn about supply chain data and analytics and learning about data preparation. So I think we are ready to kick it off with you, and that probably you will address all of this. So hopefully the audience will be very happy to be here today. Okay, so let's get the party started. Awesome. So let me share my screen. Okay, so let's get it started. So the main topic of today is going to be about data preparation for supply chain analytics with particular focus on how we can be ready for a machine learning. So in the last decades, we've seen how machine learning has expanded to the consumer industry. So basically, many of the products that we buy and the services that we use somehow embed some kind of machine learning algorithm. And this trend will continue to accelerate in the future. So for instance, so machine learning is embedded in these voice control devices that can play music or read the news, or we can just use them to order online groceries by talking then from anywhere from our houses. So every time that we use Google and start typing a search term, and Google recommends possible search terms, that's also a machine learning in action. Online details such as Amazon and also using or collecting our data, our preference and also buying habits in order to tailor their shopping experience to our needs. And similar machine learning algorithms are used every time that we watch a movie or listen to music in these online streaming services. So for instance, for instance, the music streaming services are collecting our listening habits. So basically the music that we listen to and also our preference in order to suggest new music or new songs that we might like to hear as well. Also, we've seen the application of machine learning in services such as banking as well. So every time, for example, that there's an unusual large transaction happening in our bank accounts. So typically we receive an alert message through a text, alerting about these uncommon behavior. And also in transportation, all the navigation services such as Google Maps. So every time that we are looking for the quickest path or the nearest path to a location, also transit time should be estimated. And this transit time estimation are mainly based on machine learning models. So one of the key contributor factors to seeing this expansion of machine learning in the consumer industry, of course, is the data proliferation. And this is reflected in the amount of digital data that is being produced. So basically on a daily basis, roughly 2.5 quintillion bytes of data is being generated. So 90% of the worst data has been produced over the last two years. So this means that every two years, the amount of digital information that is out there is being doubled. And of course, supply chains are also generating massive amounts of data. So for example, Amazon sells roughly 480 millions of unique items to close to a quarter of a billion customers. So Walmart handles more than one million customer transactions on every single hour. And on a daily basis, UPA is delivering more than 20 billion packages to roughly 8.4 million delivery points. And you can imagine that these companies are generating a vast amount of information, right? And this information not only about the product itself, the different characteristics of the product that are being sold and delivered, but also information about the customers. So basically what are the preference, what are they are searching for? What are they buying? And also information related to supply chains, right? Information about what are the transportation legs that these are that we are using? What are how the road routes and stop look like transit time, et cetera, et cetera. So behind all of these massive numbers, we can see that there's also massive information that is there. But it's not only volume. So in these figures, we observe that supply chains or large supply chains are generating massive amount of information. But it's also the speed and the variety at which the data is being generated. So basically on every single second, supply chains are generating different types of information. And also the type of information that we're generating is much broader now. But what is machine learning? So basically, machine learning is so disruptive because it offers a fundamentally different approach to program computers. So the traditional approach to teach, for instance, a computer, a new task is a focus on codifying existing knowledge. So basically, what we try to do with the traditional approach is try to reflect what is the knowledge of the programmer and trying to translate that knowledge into a different lens of code. So basically, we're abstracting a task based on the knowledge of the programmer. And we are writing a few lines of code to execute the task so that the machine or the computer will follow these lines and perform the task. But basically, what we are doing here is trying to reflect the knowledge, our knowledge into a few lines of codes. And of course, this has the main weakness. And that weakness is related to what is known as Polanese paradox. So basically, this paradox says that we know more than we can tell. So meaning that so our knowledge is so extensive that we might know how a certain task works, but it's difficult or sometimes impossible to translate this into words. For example, how can you tell a computer how can the machine or the computer recognize a cat face. So it will be almost impossible to write a code in order to perform that identification or visual attacks. If it is possible, it will require a thousand of lines of code. So this is where the machine learning comes into play. So basically, machine learning offers a paradigm shift. In this case, machine learning involves programming computers in a different way. So basically, we are asking the machine to learn from data and from past experience. So this approach will be similar to the approach that we use with our quick, for example. So if I want my child to recognize a cat to learn what's the cat, I will show a cat in different contexts. So every time that I see a cat, I will tell him, so this is a cat. And she will understand what is the meaning of a cat in different contexts and will be able to recognize. So this is similar approach in the sense that we're going to be offered into the machines, to the computers, so different examples. And we'll tag them as cats or different animals in such a way that based on this experience, based on this exposure to different images, so at the end, the machine will be able to recognize the object that you're trying to catch. Okay, so let's see how the machine learning process end-to-end works. So basically, everything started with some question that we want to answer. So we have a business problem that we're trying to solve. And based on this business problem, we start a data collection. So another beginning. So this data collection is basically collecting historical information. So that is information or data that we have accumulated over time. And of course, these data will come in different from multiple sources and might come in different formats. So we might receive information or data in numerics. So coming from tables, but also might get information, for example, that are coming from as text or even images. So the next step will be, okay, so we need to somehow, we need to prepare the data. As you can imagine, so the machine learning algorithm will require that the data is in specific format. For example, we need to provide the machine learning algorithms with enumeric data. So somehow we need to translate all the information that are coming from different sources formats into a single format. And this is typically what we do during data preparation. So we prepare the data in order to be ready for further analysis. And so based on the historical data, after we clean it up, we prepare, we will get a training data. So these training data are going to be the examples that I were referring to. This is going to be the images of our cats, or similar information that are going to be feeding the machine learning in order for them to be trained. And this is when the training happens. So we offer these examples. So we offer these samples to the computers. And there's going to be a training happening. So at the end, after the machine learns to recognize the cat, for example, so it will come up with a model. So basically the model is going to be just saying, okay, if I get an image, I would recognize if it's a cat or not. So basically it's going to be just a translation. So this is only one piece of the equation, of course. So at the end, so we start asking about trying to solve a problem. And we ended with a model. But of course, this is not the final goal. So we have a data set with an image of cat for instances, for instance. But we don't, we know already that information. So we know the training is set. So basically what we want the machine to do is to recognize new objects. So basically offer the machine new objects and see if the machine is able to recognize the object that is in the picture, for example. So that's why there's also another stream of data that we need to collect. So basically this is the live or ongoing the, of ongoing data, the search team. So again, so this is coming from the similar sources and it's coming from a different format as well. But the main difference with the historical data basically is that we are using, we're gathering this information live. And as before, this data also should be harmonized and aggregated. So basically it's going to be following the exactly the same processing steps that we did for our historical data. And at the end, so then we're going to be using the model that was trained. I'm going to be using this model with the new information that we have. Basically we're going to be using the model with this new ongoing data. And at the end, we care about the outputs. So we're going to be making the prediction on the new data. So it's, as you can see, so data preparation is not only key before training the data, but it's also very important to process our data only also when we're receiving this live data, because at the end, we mainly care about the output. We mainly care about using the computer or the machine in order to recognize new objects are not only in historical data. As you can see, so what we basically are doing in the model is a prediction. So basically, in this case, prediction will imply taking information of one kind and getting information of another kind. Okay, so basically what we're doing is we're providing, for instance, the object, the machine, okay, or a picture, which contain an object and the machine or the computer will tell us what is the object that is containing the image. So basically, we are getting this prediction. And the prediction could be just interpret as a simple A to B, A to B mapping. So basically, the model, they're going to be taking some inputs. So that will be the A. And then it's going to be based on the train model, based on that information that on the training data is going to be getting a response to B. So this is just an A to B to be mapping. So we're converting information that we know, so the image, the pixels on the image. And based on this information, we're going to get any information that we don't know, but we care about. And it will, in this case, will be the object that is in the image. And we can, we can, we can mention a different example for this. So for example, every time that we get an email, so Gmail, for example, needs to decide if the email is spam or not. So basically, based on different characteristics of the, of the emails based on the train set, we're going to be saying if the current email that we are getting is classified or is flagged as spam or not. I mentioned the example of the, of the image and the, and the object. A similar, a similar use case will be every time that we're going to be translating, we're going to be transcribing from an audio clip. So the audio clip is going to be the input. And then we're going to use it in order to get the task. So basically transcribing the audio clip into, into a text or the, or the other way around. So every time that we have a text and we ask, for example, word to read that, that, that text. And finally, in translation services as well. So every time that we're going to be translating for it, for instance, from English to, to French, this is exactly the, what is, what is happening. And in supply chains, we are typically, we're typically as associate prediction with the, with the future. So we make, so we look at historical information in order to make a forecast that will imply some behavior of the future. However, as you can see in, in all of these examples, so the, the third prediction in machine learning has a broader meaning. So meaning that prediction could be, of course, could be used to predict something in the future, to forecast something in the future. But also we can use this prediction for real time or even, or even the past. For example, every time that we get an email, so Gmail should make a decision, live decision in order to, to, to determine if the email is a span or not. So we're not forecasting about the future, we're forecasting about the, the pressing, we're making a prediction about the, the press. And same thing happened every time that we have an initial transaction in banking for, for, for instance. Another key characteristic of the machine learning algorithms is the, also the, that they provides more accurate predictions compared to traditional, traditional statistical methods. And this can be a, can be a better understood with these graphics. So in this graphic, what we're trying to reflect is the performance of the algorithm or the prediction. So you can say, for example, how accurate is the machine able to recognize cuts versus the amount of data that we feed. So basically based on the number of images or pictures are going to be provided into the machine to train. So the beginning, or if we use traditional machine learnings, so this will be the typical behavior of the, of the performance. So we start feeding with more and more data and we can see that there is a rapid increase in the, in the performance of the algorithm, right? There's a rapid increase in the accuracy of the algorithm. But at some point, so it reached the threshold and basically, regardless if we feed with more and more data, so the accuracy is going to be, is going to be basically the same. But what happened if we move to a more advanced machine learning? So what's happening? For example, we use a very small neural network if we use a deep learning. As we can see, again, we have a very gain, a very rapid increase in accuracy in performance. And we can feed more and more data and we can see that there is likely increase in the performance. And this increase can be even more drastic when we have a large neural network. So basically when we have millions of millions of neural, oh, neural, new neurons, we can expect a similar performance. Of course, there's a theoretical, a theoretical maximum of the, of the accuracy and performance. But the main point here is we can use these very advanced machine learning algorithms. So basically the performance or the accuracy that we can, we can achieve is, it's much higher. And of course, this implied using more and more data. As I was mentioned, machine learning requires data to be in a specific format. So most of the machine learning will require all the inputs to be as numeric. So of course, we can use a machine learning to image or text. But somehow we need to process that data in order to convert in the proper format to the machine learning to use. And of course, we will require some preparation in the data. As you can imagine, regardless, if we have the perfect model, so the garbage in, garbage out applies here. So regardless, if you have a perfect model, if we are feeding the model with information or data that is not, is not of good quality, of not precise, of course, the prediction that we are getting is going to be also uses or going to be misleading. So it's very important for the machine learning algorithms to have a quality, quality inputs. And usually when we talk about the data preparation, what we are referring to is mainly getting all the data ready for a, for a, for further analysis. And typically data preparation involves two tasks. On one end, we have data purposes, prepossessing. And on the other, we have data in general. And let's, let's, let me talk about a more in detail about this. So when we refer to data processing, basically we're using about specific tasks that we need to do on the data. For example, we need to clean the data. Somehow we need to replace, for instance, some erroneous values. We need to input misinformation. We might need to, we, we, we need to partition, for example, data days, and those similar structures. The first one is, of course, cleaning the data. And when doing the cleaning data, there are two main activities that we have to look for. So the first one is removing, removing outliers. So basically we're going to be looking for or identifying these are typical, a very high or very low, low, low values. And usually we need to define what is the better strategy for us based on the context of analysis to deal with this, with these outliers. The second thing is about duplicates. So basically if we, if we have duplicates information, we also might need to remove this information. But also data cleaning includes additional tasks. So for instance, replacing and missing the inaccurate data or even correcting missing values. So we might use some imputation techniques here. Also, as part of these data preposings, so we need to partition the data. So typically we will divide our data set in three different data sets. So we'll get a training, a validation, and also a test set. So as the name says, the training data set is going to be used to train our models. So the validation set is going to be used in order to tune the different parameter that the model might need. And finally, the test of the model, the test data is going to be used to evaluate the performance, for example, the accuracy of the model. And of course, we will need to somehow randomly create or divide our original data set in this through three subsets. Also, scale, it's important also that all the features, all the columns, all the variables are going to be used in the model are in a similar scale, in order to make sure that all features are equally important. And this is particularly true for specific machine learning algorithms, which calculate distances between features. So here are two strategies that can be followed. So we can normalize the data or we can normalize the future, basically meaning that translating or transforming our data into a zero, one scale. So basically, all the values of the new columns are going to be contained between this range, between zero or one. And the second option is to standardize the data. So basically, all the data is going to be having, is going to be centering to me, which is going to be zero, and we'll have a standard deviation of one. And finally, we can also explore data augmentation. So basically, this will be the strategy to artificially create data from existing data. And this is quite useful, especially when we have a small data set. So basically, we're going to be just creating new data or synthetic new database on the current historical information that we have. Regarding a future in generating, also, so in this part, basically, what we are trying to do is to identify what are the best future are going to be feeding into the into the into the model. So in the previous step, what we did just we cleaned the data, we divided in three in different in different sets, and we generate more data if it needs. But now it's time to focus on, okay, what are going to be the variables that are going to be using the model. And of course, there's a lot of there's a lot of visualization that can help can help here. Here, we're trying to identify what will be the more meaningful, a minimum, meaningful features or variables that are going to be using in the model. So it's a good way to identify these will be different and different analysis correlation analysis or different, different, different graphs. But also there are specific, specific tasks that I can perform during future in generating. And one is a feature selection. And basically, this is going to be just the process of trying to identify what are the best, the best features of the of the model. And this can be achieved through or getting an important score. So basically telling telling me or doing a printed analysis in order to identify what will be the more important features. But also we can run, for example, correlation analysis in order to see what will be the best choices to make. But basically here what we are doing is just filtering what are the most important features to be included in the model. We might also require to to transform somehow the data. So these tasks just implies and modifying the data, but keeping the same information. So for instance, we can somehow make some, some, some transformation in our data. It will might, for example, take the logarithm of the specific specific variable, or we might need to use Cartesian product, for example, and we believe there that may be a correlation between categorical, categorical value. So basically, we are just playing with the data and coming up with different additional features. And similar thing happened with feature creation. So in this, so here our feature creation basically involves a creating new data from existing data. So for instance, every time that we're going to use one holding coding, so when we translate a categorical variables into a dummy variables. So we are creating a new feature based on one holding coding, or every time, for example, that we calculate new features based on the current variables that we have every time that we do, we are creating a new feature. And finally, feature extraction. So this will be case is particularly in which we have a very large data sets and not talking about the examples or samples that we have, but regarding the features that we would have. And in that case, it might be a good idea to reduce the amount of data being processed, especially to consume less computational resources and have a model that runs faster. And of course, here, there are a bunch of methods that can be used. For example, we can use PCI, which will allow us to reduce the dimensionality of our data sets. So basically, it's going to be just producing the number of colors that are going to be used. And with that, so let me stop here and let open to questions that you might have. Awesome. Well, thank you, Sergio. A great fascinating presentation really highlights a central role that data preparation really plays in the process and how it fits into that pipeline. And where we can start to then utilize some of these advanced techniques or some of these kind of transformational techniques like machine learning. Maybe before we start diving into some specific questions on data and machine learning, and I also see that there's a bunch of questions there in the Q&A. So that's awesome. Keep it up. Please jump in there with your questions if you have them. We'll keep an eye on that Q&A feature. But before kind of diving into some specific questions about data and machine learning, I wanted to start out with maybe a broad question on your experience and what you see the role for data and supply chain, supply chain strategies, supply chain designs, supply chain operations. What role does data and machine learning play in these different functions? Okay. So data is very important for the definition of the supply chain strategy. But also it's very important to streamline operations. And also data is very important for instance to come out with new product and services and also to improve the user or consumer experience. So by collecting data, so what we can do or what we can achieve, so basically is to improve our operations. Because at the end, so the data will become a matrix and based on this matrix, we can improve the operations. So this is one use that we can we can do in our data. So basically to improve our operation, but also data can allow us to improve transparency and VCB. For example, if you relate to the leverage to shipment, so we can track every single moment where my shipment is. So we can use data in order to enhance the visibility that we have on our data. And of course, data is just data and unless we can further analyze it. So and that is where the importance of data analytics come into play. So data analytics or supply chain analytics, so basically will allow us to analyze, to process the data in order to improve this decision-making. So at the end, the main goal of the data analytics and data should be support the decision-making in all the aspects and in all the functions of supply chains. Thank you, Sergio, for bringing that. I love connecting data to visibility because that's, you know, it's very important to us and we always work on this end to end vision. So that's a great insight. And I wanted also to bring it to like down to earth tasks we do when we are working in supply chain. We also have a lot of questions on this from the audience. So it's great that we bring it right here right now. We would love to hear about the importance of a role of data. For example, in topics like network design models, when we are defining optimum location for a new facility or we're trying to think on what's the best physical flow for our products at a certain point in time. We would love to know how may data collection and preparation be different when we're approaching different type of supply chain problems. And if there is any challenge, you foresee that you would like to share with us. Sure. Optimization models in particular, the network design models are also a dating test. So we need to collect or we need to forecast the mandate. For example, we need somehow to estimate distances of transit times in order to fit this model. So we need data collection and data preparation also for these models. So however, the main difference that I identify is basically on the level of intensity that we need in this task. As I mentioned, so machine learning really needs a lot of data. So data collection and data preparation are usually a very resource and time consuming task. And usually in optimization models, at least in those that are strategic decisions. So every time, for example, not going to be designing the network or a facility location problem. So this is not going to be the decision that we're going to be doing on a monthly basis. So usually the time span is going to be evolving at least a few years. So basically that means that the data collection and the data preparation is going to be one single effort and that effort is not going to be repeated in the next five or four years. However, in the case of machine learning, so we might have one time effort to collect the training data. But as I mentioned, we also need to collect live data. So basically, the effort to record that data is going to be on a daily basis. On a daily basis, we should be collecting the new data. We should be cleaning the new data and that data is going to be feeding the model in order to get the predictions that we want. So the main difference I would say is the level of intensity that is to start going to be consuming. Awesome. Thank you. That's a great insight. So I want to pull in a couple of questions here. I'm going to tie in a couple of questions that I see in the Q&A also with one we had prepared on just this process of cleaning data, but then just the concept of cleaning data in the first place. If you think about it, if we had perfect data to begin with, maybe you have your perfect model, but if you had perfect data to begin with, we wouldn't need to clean it in the first place maybe, or maybe we would have to fewer or different types of data transformation or data processing needs. So then maybe the question goes to, and this also goes to ties into a question by Rishi on what is really this maybe some of the sources or what's been your experience on some of the sources, some of those data quality issues upstream? Is it all the way upstream to the point of capture with the sensors or is it the systems or is it the databases that the data is stored in? What are some of the I guess maybe weaknesses within that data pipeline and then how what are some of the processes that you can then utilize to try to account for some of those issues, for some of those quality issues you notice upstream? Got it. Yeah. So two things here. So on one end, so data quality might be an issue and basically data quality is mainly coming from issues that we may have during a data capture. So we may have different levels of security in our measuring systems. That's one source of error, but also so it's related to the precision of the instrument that we are using, but also it's maybe the instruments are not very well calibrated. So there are some measuring issues that we may have precision related or maybe just a calibration, calibration issue, but also keep in mind that there's still an important important size of the data that is being collected manually. So basically companies still using some manual inputs to feed the systems. And of course this manual system or human intensive processes are prone to error. So that's one thing. So issues related to quality, so mainly coming from the data capturing and piece. However, even though we have the perfect data in terms of quality, so we might need also some processing as well in the data. Because the data that we might need for a particular model almost for sure will be coming from different sources. And these different sources might be providing the data in different levels of granularity. So we'll need some data to be aggregated, some data to be aggregated, and we need to put it in the right format as well. So somehow the data preparation is going to be required for our models. Thank you, Sergio. And I want to switch gears to talk about tools specifically, because we have a lot of questions about tools and we were also wondering what's your approach or your recommendation? I would say that it depends on the use case. So for instance, we are supply chain managers. So we mainly care about the business and we might be running some analysis or getting some machine learning models a few times in a year. So maybe three, four times a year. So the specific use case, specific analysis that we need to do. It's not going to be, of course, an in-depth analysis. It's going to be just some exploration and trying to come up with a hypothesis about the business. In that situation, for example, we can use many of the plug-and-play tools that we have out there. So for example, orange. So it's not really, it doesn't require any specific coding skills. So we just plug-and-play, identify or create a few charts in orange, identify what might be the relevant relations. And based on that, we just plug-and-play, we can train a few models. So that will be the best time investment, for example, for this time of users. But of course, if you are a data supply data analyst, I'm going to be running this model more times a year. So somehow you will need to go to these tools that will require some coding skills. And ARRI, for example, is an option here for Python as well. But this is, again, so this is like the intermediate user. I'm going to be running this analysis on a frequently fragmented basis. But at the end, the analysis is going to be used to make recommendations or to inform decisions. But your models are going to be not only implemented in company systems, right? It's only to inform decisions. However, if the use cases might be different, right? So we may have, for example, as an example that I was showing, right? So recommendation services, Spotify for instance, right? To suggest new songs that you might like. In those situations, there is a machine learning model that is in the company systems. So there are some recommendation systems in there. And if that's use case, of course, not going to be using these plug and play tools or going to be using a JAS, a Python or AR. In those situations, you have to use systems that will allow you to scale your operations. Scaling in terms of the data that you're going to be using, but also in the computational power that you will need. In those situations, so the cloud tools are going to be your best options. Awesome. Thank you. Definitely a diversity of tools out there. And it makes sense to contextualize them based on the use case. Makes a lot of sense. So I know I'm keeping an eye on the time here. I know we're going to get close to the time here. So if we could maybe launch our second poll here, we'll be closing things out. And I think we may have time for maybe one more question there, Laura. And I see one here that looks interesting from Alton Edis. And I apologize I'm not pronouncing your name wrong, but Alton Edis, this question I think is an interesting one. How do neural networks, how are neural networks built and what infrastructure is used? Kind of how do you actually build and run a neural network machine learning model? Sounds like a strange thing because you mentioned those as some being some of the more accurate types of models in your presentation. So Yeah. Also, you can also run. So there are specific libraries, Python libraries for instance, that you can use to build a small and also a large neural networks. So the main difference compared to traditional method is of course the accuracy that I mentioned that you can achieve with models. But also you can get rid of part of the data preparation. So basically all related to future engineering. So in order to select what are going to be the ideals, the ideal features to include in your analysis or your model. So all of these tasks can be taken care of the of the neural network. So basically you just ignore that piece, you ignore a future future engineering, you just feed your neural network with all with all the features that you have. And then the model itself will identify what are the the most relevant relevant features. Thank you, Sergio. I hope you hear me well now. I want to bring one last question before we go to the poll results. Michael and Lucas are bringing small and medium size businesses. How does it work for this kind of businesses to improve data gathering and preparation and the use of data and the implementation of machine learning, how possible and feasible it is when we're talking about small businesses. Yeah, in for the small businesses, I think it's it's critical first to identify what will be the machine learning and need. So basically, once the use case, what's the problem that they are trying to solve. Once they identify, okay, once the problem they're trying to solve, then the next question will be if some machine learning could be a solution to that, right? So as I was explaining, machine learning offers a way to come up with predictions. So the main usage for machine learning will be to these predictions. So if machine learning is an option, so the next question will be about data collections, data collections, because especially for small businesses, so we cannot just collect as much data as we need because data collection is also coming with the cost. So in that case, we have to focus on the data that we at least we believe is going to be relevant for analysis and just collecting the data that we need is going to be critical for that. And then we can use some tools online for all the data processing also to come up with a model. So they are the typical service where you just pay a variable cost. So by just using or training the model a few times. In that way, you don't have to invest in a factory or additional resources to train your models. Awesome. Thank you, Sergio. It definitely makes sense. The difference between a company, you mentioned Walmart and Amazon as examples where Walmart is processing like a million transactions per hour, which is kind of like mind-boggling scale versus a small business which might have a few customers in our, you know, definitely difference in scale and the different approaches and different volume of data to manage. So maybe if we could share, take a look at that poll really quickly here and then we'll wrap up our live events of the poll. The question was, you know, what is the most interesting part of today's session for you? And it looks like the most common answer was expanding my knowledge of data and analytics. I'm learning about the data preparation process. I'm learning about specific applications of data and ML and spy chain. So that's great. We're glad you found those topics interesting. I know we, there's lots of questions in the Q&A. We always have way more questions that we have time for. But I don't know, Laura or Sergio, you don't know if you have any final comments you'd like to say before we wrap things up here. I would just like to thank Sergio for joining us today and for bringing a lot of great insights to our audience. You have addressed most of the questions we've received before we even got to the Q&A feature part. So that's great. It means like everything we provided was of their interest. So that's amazing. And we would love to see you in the future again. So hopefully we can host you again soon. It's been my pleasure. So it's, as mentioned at the beginning, it's always a pleasure to be in touch with Wichita and also within my McMaster community. Thanks, Kay, for having me. Yeah, thank you, Sergio. And thank you, Laura, for co-hosting. And thank you, everyone, for attending today. Bye. See you in the next one.