 Terima kasih. Saya akan melihat keadaan. Zain, mari kita berkongsi tentang Waiyaw Team untuk kompetisi. Benar. Hai, semua. Saya Zain dari Team Data Center. Saya rasa data center ini adalah satu dari dua team yang dianggapkan di Singapura, adalah data science team. Dan lainnya, saya ada Waiyaw Team, yang akan dianggap. Saya sangat gembira dapat melihat yang sangat bagus. Terutama, saya akan berkongsi dengan baik-baik. Saya akan memperkenalkan. Saya akan memperkenalkan. Saya akan memperkenalkan dengan game data science. Jadi, apa yang game data science ini? Kejutan, kompetisi, dan kompetisi terhadap game data science. Saya akan memperkenalkan dengan sebuah pelanggan yang berjaya. Saya rasa ia sebenarnya mengalami salah satu perjalanan yang berjaya di Paris. Saya adalah salah satu universiti yang berjaya di Paris. Ini adalah tahun kedua yang mereka memperkenalkan kompetisi ini. Jadi, sesiapa yang masih belajar, jika anda berminat dengan TIPAA di tahun depan, saya rasa mereka akan memperkenalkan tahun depan. Jadi, saya rasa ia berjaya kellera. Saya akan berpindah untuk bersiap secara organisasi dan syarikat. Muka而且, saya akan beritahu yang betul. Okay, betul. Jadi, saya akan beritahu yang betul. Saya akan beritahu yang betul. Kalau minta, saya akan beritahu. Saya akan beritahu yang betul. Saya akan beritahu yang betul. Saya akan beritahu yang betul. Dan, okey, akhirnya adalah actually a quote conversion challenge organized by ATSA. So the data science from ATSA. And we did like a three days hackathon over in a castle in Oskar, Paris, which is a really beautiful place. And this is the final approach. So, okey, the online point of view phase is an image quality classification challenge. Okey, the business objective. I always like to know more about the business objective before I go down into a challenge to understand what they're trying to do in terms of the business sense. So they are actually looking to optimize solar energy production. So how do they do that? They want to know what's the orientation of a specific group from satellite images. So for certain orientation such as a flat or even sub, it's actually a more optimal for solar energy production. And the technical objective is basically given the satellite image and the labels. How are you going to trade an algorithm to classify correctly the orientation of the groups? And the images are hand-tabled, so they cross-sourced the images and hand-labeled by one group. And they only classify the image and label if there's more than three people who agree on the label. And you'll see later why. So okey, the first class is actually non-sub, which is an optimal one. And second is the stress orientation. So it's a flat group, and then form is just others whatever that don't classify other three. And if you go to the website, it's actually really fun to play with it. So they actually have this website where you can label them manually like this. It's probably a flat one. And then this is flat. And if you don't know, you can just list others. So it's really fun. So you can do this in the end. So okey, so as you get associated with data, there's always like, you know, the DN noisy data. So we have like something like this. So like what is this? You don't even know. So this is actually from the test data set. It's one of the images that always go wrong. So we really search for it manually and see what it is. And this is like, okey, it's not really an idea but this can either be like a flat or a non-sub orientation. So it's very ambiguous. And we have this, which is like, I have no idea what kind of road this is. And yeah, this is also ambiguous like it can be non-sub and this rest. So it's a combination. So it might mean others. And okey, the accuracy, the evaluation metric is basically just classification accuracy. So, and we have like 8,000 images and our hand label, different hand label and 40,000 test set. So okey, for those who are now familiar with like Kaggle or those competition, data size competition, basically we are given this 2 set and then we will need to train like a model to predict the test set. So for the test set, we don't have the labels, we don't know what's the competition. And for the test set, it's actually speak at 40% to public leaderboard and 60% private. So we will only know the public leaderboard during the competition and the private leaderboard result is only with this, that is their line of the competition. And yeah, these are the proportions of the label on the training data set. It's like predominantly one and then, you know, two and four. So for the data preparation, it's just like B6000 that we did. So we did like data augmentation. We created more images based on 8,000 training images by rotating like randomly up to 70 degrees and horizontal and vertical shift of 10% or maximum 10% of the grade and higher and shear up to 0.2 gradient which is like about 11 degrees. So it's like 11 degrees. So shear is like you have image, so you have basically something like this. You shear the image and then zoom up to 20%. Horizontal fit, so you take it horizontally and standard scaling so that it's standardised by dividing it by minus min and divide by standard equation. And we take this at 5 to 10 times. So for certain models we augment it 5x21 images and sometimes we do like 10 times. But actually at the end we stop at 5. And something that we learnt after 25 years from other top teams that we didn't do, unfortunately, it's like, so again, actually rotate 90 degrees from 0.1 and 2. So if you just switch the table which is something very smart but we didn't think of. And they also like, you can also rotate 3 and 4 for the flat and others. So you can get that up to 90%. Sorry, you can get up to 90% more, just like this. And then you further up when it will be much more. So these are some tricks that we learnt. And okay, so, that's what we do. And in one response, sit it in candle like the first thing we try is like as you go. Because it's like master agreement in candle. Okay. I'm sorry. Right. So, so how do we do that? So first we convert the image into like what it is clear of matrix. So each flow is like 64x64 and the pixels of the image. And we just use a great scale and then we just trade the model actually the most model. The accuracy is like 56% on little board which is not very bad but it's still better than random models. But it's like you can't compare with what you said, you know, deep learning model. So, what's to David to talk about more on a model which Thank you, and David, first I need to say that none of us are a computer mission expert and so the other activity was we went on to take the next which is a convolutional nest. So, because none of us had any experience so we just went on to the internet So, we think of 5 different convolutional nest model using 5 different deep learning remotes. So, we have the learn-naps, the gain-naps, the end assumptions, which is G16 and the last nest. So, I have to compare pre-control for each of the model. So, learn-naps is entered by Lekun who is currently the crash factor and he's one of the very first top-net that have propelled the field of deep learning. So, currently most of the convolutional nest still based on this rough idea which is the connect can be divided into 4 main steps. The first one with the convolution which you have the convolutional filters of different sites like G16 then you move it across the whole image then you get a feature back from the filter then you introduce model linearity using the same mark or in this case it's ratified in any unit and the next step will be pooling which get an average pooling or max pooling and in the end you have a fully connected layers which is happen to be the empty layer perceptions. We try the accuracy is quite bad which is slightly better than XGBose then we try the next one which is happen to be a winner of ILS VRC 2012 which is a computer vision convolution based on the image nest datasets is a significant breakthrough with respect to the previous approach and basically consists of 8 layers which 5 of them with convolutional layers and 3 fully connected layers with drop ups to prevent overlooking Later I will show the individual score then on the test datasets so you compare the performance of each of the model then you have the VN inception model which is done by a group of good researchers so basically what the training this allow a higher learning rates so it require less training steps so you can converge faster and it dramatically reduce the number of parameters you need to deal with so basically from 4 million for 60 million which is AlexNet to 4 million then you have VGG which is the first round up of the same competition in 2014 but based on the same datasets from 2020 at the top the contribution they introduced is a very small convolutional filters basically is a 3 by 3 filters in all the years and they show that when they increase the depth of the neural nets contribute to a better performance but they can do that because they use a very small filters and they have in this model they use so they increased the number of layers to 16 in this case and finally you have the first news not the first news the reason middle of the same competition and the co-co 2015 which is still the current state of the arts co-finance model which is a resnet from Microsoft research team so basically solve the optimization issue with the deeper network so they push the limit of the depth compared to BGG how they do that is basically they adding shortcuts, connections by skipping one or more of the years of cof cof layers and sum up with the output of the steps in the years so basically you found and thanks to Zain you found this resnet model which boosts our performance a lot and they have a portray models for internet ranging from 18 to 200 and Zain completed basic enough able to modify script opinion and it happen to be our best individuals deep learning model so these are the score on the test sets for each of the individuals model and we actually didn't expect the from scratch it means we train the model from scratch we using any pre-train models we only realize that at the end plus few days so we didn't have much time to train the deeper networks on the proof data and fortunately what we mean is the master of ensemble that help us to boost our score further that let us into nine squads now pass it to Wei Ming ok, thank you David so at this stage of the competition all of us very tired and exhausted and our machines are tired exhausted so what is the next stage next strategy which we take from here is ensemble so we have all the individual models here and even the top one can achieve the top 20 positions but we want to get higher to make sure they can secure the top 20 positions so naturally the next step we should think about is ensemble how we can use the performance the powers of all models instead of just give the prediction by using the prediction of one single model so that is the strategy we actually try very hard to find out of the next stage and as usual for the multi-cast prediction problem the first thing we think about is to use the average of the prediction and probabilities for each cast i think that's the natural strategy we often thought for multi-cast problems but unfortunately this one was a hard prediction which means you have to give either 1, 2, 3, or 4 instead of give probability so they're not using log loss so the strategy is this where the strategy we try is called majority vote that thing about in the country we have a precedent and all others congress or whatever who are actually making decision at the same time but let's say our best model Bresna is the president and he makes the decision 1, 2, 3, 4 for each of the process and instead of hing-hing just to take the power and make a prediction for all of them what we did was that we use all other models let's say 10 of them and each of them will give a vote for each of the samples and we're just okay, saying if the majority of all others models agree on one vote and that vote will be the final vote whatever no matter what the president votes so that is a single strategy and it turns out it's quite wild so because for all the traditions naturally we will go for the precedent vote which is our best precedent vote but if like say even out of the 10 agree on a different vote we will overturn my president's decision and use that as a final predictions so this is a simple strategy we use okay because the majority means that all other models except the best Bresna because we also have other Bresna models like the one you saw just now like the other model lower foundation but it's also Bresna architecture so as the end of the prediction the master help us to boost the strength from the 20 to the 9 so by simply using the majority vote resembling all of the traditions that we have made already so this is the hope the final hope that we see that works quite well works out quite well because as this moment of stage we do not make any more predictions from new architecture or existing architecture so it is a simple final strategy we think and so we boost our ranking to the 9 so what didn't work out like I said resembling of the probabilities does not really help out because this is not a lot cost evaluation matrix it could work quite out of other computations where you are using like lot loss of the probabilities and what we have what we should have try it but we didn't have time to try it that actually boost to extract abstract features out of the fixo features then fix sorry the pre-stream models to extract features from the fixo levels then you do use those abstract features to fit actually boost for example this could work as well but we didn't try so yes so I think that's all final stage for this competition and now I will pass to Jalat for the second round of our competition which is in Paris Thanks to me that was an excellent explanation of what can be said Hi guys I am Mohammad Jawa and this was our entire part towards the finals competition so no less more to what we had to face in the finals so the finals of the finals of data science game was held in Paris as Zaid mentioned before and it was a hackathon style event where we had to start out for about one and a half days to come up with a model the predictive model and it was sponsored by AXA so so this is how I'll be going forward from now I'll just give a brief introduction about the competition itself and then then I'll speak about the methodology I'll give a brief intro about the data set and speak about the methodology or the build process that we get through for one and a half days to come up for the competition The challenge was to build an insurance code conversion model and the data set was provided by AXA who was sponsored for the competition and the insurance code conversion model was for the specific data set contain information about the car insurance and so if a user if a user is going to purchase an insurance code he would enter all his information that was available to him and then to submit this details and get a code directly from AXA's website or from anyone of the subsidiaries like brokers or agents so the code that the user gets can be the same or different and it can be a same user who has requested for code from multiple simple subsidiaries or brokers and so that was the model of the competition and he had to come up with a solution to predict if the user will be converting his code and purchasing it or and you also have to predict among the given channels which is a subscriber or which is a broker through which he will be purchasing the code so among the number of codes the user can either purchase from one broker or he may not purchase the insurance at all and so the data set that we received was something like this each code that that was shown to the user was provided to us and so the code information contain user details is personal information obviously must personal information and the policy information the car details and things like that so this is from the website where the exa collects user's information which is publicly available and the data set was highly imbalanced less than one percentage of the users who got the codes converted into the insurance so it was a highly imbalanced test problem and the evaluation metric that was given to us was laudras probability calculation so let's move on and see what was the build process that we had to solve this problem so this is what we had basically in mind the first thing we wanted to do was to get to know the data well discover inside from the data counter these insights into features that can be used by the model and to perfect the model that we are choosing so we spent about the competition began early in the morning in Saturday and we spent about 2-3 hours working through the entire data set trying to figure out what makes sense can we track something out of this can we find some features that we could really boost the model that we will be building so that was the time spent in feature engineering and some of the insights that we got was quite interesting so based on the course you can say so usually in European countries I think the purchase or the validity of the insurance cost for one year and so if the car's date is approaching one year or more than just slightly more than 365 days there's a much higher chance that the user is going to purchase the insurance irrespective of the cold price so that was one such indicator that we just found from the data inside and so we wanted to convert all these insights into features that the model can take advantage of and improve our accuracy we categorized certain numerical features like what we discussed as before and then we did various policy based features user based features user based behavior statistics and all that and there were a lot of categorical variables one hot encoded one hot encoded help them and this was a model that we tried after feature engineering so we first since this is a logglass evaluation metric we have high confidence in our jago standard model which was a different model for category competitions and we went and tried exibu standard it was promising results but we still wanted to try other algorithms and see accuracy would improve much higher so we came up with random follows exibu keras and logistic regression and didn't and simple of all these models but it was still worse than the first exibu model that we built so we decided not to spend too much time on doing it simple and go back to what we do well and focus on exibu feature engineering and tune our model so that it would help or improve the accuracy so we also utilize exibu feature importance which which gives you the list of important features and we try we re-attracted the feature engineering for the modeling tuning we used 5-4 stratified cross-qualitation since this is an imbalance data set we had to use stratified cross-qualitation and finally what we did was we had about 1-1.5 hours at the end of the competition so we wanted to try an in-sample of exibu we created 2 exibu and again it was very insidious to try this out we had so much confidence in the in-sample approach so we created 2 exibu models with different initialization parameters and then called the probability and average to the average probability slightly boosted our score I guess so that was it and and final results and we were trying 6 in a little that's it one question so which machines what hardware did you use to train the deep learning model that furious yeah we used both AWS and even myself I have a lesson machine David super computer okay super computer okay yeah this one I have a question in term of the hero that architecture for the first one and second one I get the first one is 18 layer to train from scratch and the second one is like 5 tuning right on the picture model we should train on the ImageNet DSM yes which has our own so how long does it take for training from scratch and how long does it take for the second architecture training 5 tuning and one so I think close to 2 days running on the n-media GTX 1080 perfectly 1080 1080 oh and the train model I think should test their tougher day alright around maybe in 6 to 8 hours depends on the learning rate you said alright but usually for train model for the 5 tuning you need to actually set a lower learning rate yeah so you don't exchange the weight too much it's like about the winner's round of challenge I mean other than rotating images what they need to do that they love that to be as compared to the winner for the first one what? for the other engineers for the first one I think I mentioned just now that they rotate the image to create more training data set so in case you miss it so actually like yeah so it actually will take 90 degrees 30 meters for label 1 and 2 so then switch to label so you can have more about 90% more train data and then I think one of the teams one of the top teams also they use pre-train models than what we did and they extract features from the pre-train models and then they use it to train on they train on random forest or so it's a stacking so they use pre-train to pre-train on this on these images and then they extract the features is like probably this but it's just a numbers and then they train another stacking model on top of it so it's like stacking basically stacking and one of the team they have access to can feel faster because they are mostly university researchers and then before I think excellent put it in put the others the real winners will be excellent the real winners they are just too imbalance because they have actually I have to mention one thing if you look at the actual data you will see that for those data model make the wrong predictions on and even you are like human like us cannot make cannot get the correct one so how can we the model learn to to predict the correct one because the model trying to learn to get close to the humans right because we are learning from ground truth ground truth is labeled by humans they are actually quite a lot of errors inside data sets and also those errors those images that human couldn't label frankly so what I mean is that probably they are just by luck I'm not saying a lot but that's the truth because you look at the data with our single best models for those that are not pretty crafty they are all impossible to tell current truth so what I'm saying is about 80% is all good models but there's no new difference I mean zero point like 1% to 2% is nothing at all I mean it's no difference in reality Any other questions? But on the second challenge that we did in Paris what did you also do that you make the model better? Sorry? What was the method that you use or what was the last action that you use in order to get better results? So the first the team that came first tried just XGBoost without using any models so the difference should have arise basically on the future in any part they could have figured more features and I think that is or during the time of parameters that's just what the difference should have No Actually one of the future they use actually we forgot to use is that to use topic modeling or RDA on those category of features it's quite common strategies that we forgot to use actually just like you have 1010 category get topic modeling some sort of stuff to actually reduce dimension to 10105 and use those as additional features which we forgot to do during the final competition along how many features are you using and during the future engineering part do you find some useful future engineering skills? Yeah okay so the really useful a number of course this user so that is called the behavior features this is actually very useful for real world like problems like for example email campaigns or other things so the user behavior is always the most important features in order to make predictions from the response of the users so the more of course the user's thought probably just spam or just trying to compare between different companies so that is actually one of the most we call golden features that increase a huge amount so that will differentiate 5 teams so that is one of the features you must get otherwise like another feature is signal feature user behavior features and other features like product based features like what kind of product you can get from this product specifically from the original essence on top of the original essence and the rest and do you use some special feature engineering techniques which can improve some feature like I said top modeling on categoricals which the better winning model teams tried and like who buys who buys or case statistics cannot review too much the secret okay just basic I mean who buys who buys statistics okay so when you apply the simple strategy how about how do you deal with those case the two options and then we same problem because you have A or B and again why not using weighting oh you mean the enzambling part okay so you're saying why not just treat them like without any precedent or congress everyone's equivalent just using weights to attach to each of the vote this is one such a one how about if there are two two options end of the same problem because first of all we will not have much time to try different strategy at the last minute but the basic and it's very simple we just want to improve from the best single model we have got right we don't want to get worse because because if you try different ways it can turn out to be worse so based on this idea we define we think about the majority vote which is the majority votes a different prediction compare with the best single model when we use a majority vote instead of the best single model otherwise it's the best case because everyone agrees on the same thing but the majority can be two candidates right? yeah the majority is like like let's say we have ten other models majority should be more than 5 or 50% outcome and we try the threshold like 6, 7, 8, 9 and turn out to be 1 which is 7 i think i want to give the best accuracy based on divorce scores so you just try an error because you don't know what your threshold if you have more like 8 or 9 you'll get less overturned like less the difference the difference is meaningless it's like 0.001% of course though have much higher true positive but the thing is you cannot make a big difference so that's why you have lower threshold if you lower too much it will overturn like wrongly false positive so that's why you have to find the best threshold okay guys we will go on the team coming up so let's say some questions for YNCX next thanks we're part of YNCX we're an undergraduate team from Yale and New York college liberal arts college right in U-town and yeah I thanks Data Ninja for giving a really comprehensive explanation of the two challenges I think at this level everyone like at the near the top of lead or they do pretty much the same things very similar things so they did a lot of the heavy lifting for us our presentations to be a lot wider and yeah to start off we have there's a video or the data science team would you guys like to watch it? yes okay over here okay it's a massively world-wide competition and we know that that is an expectation that what we need data science game represents a unique opportunity to meet with very smart students and potentially meet with a very large customer of us who has a lot of very creative problem and it's going to be fun so put that to everybody at the presentations over here we're wanting to gather every student go out and place in Paris to let them share their knowledge team-mates, mentors and students to give it from each of us kids which makes the data science game a unique kind of experience the data science game in China really is a forum where students and outside private sector can exchange it's a very interesting channel because like first of all it's not because we don't have that much knowledge and we understand that how students work and also that class of management and the data so we don't know what we need to test it so we don't need to work for that to do well the challenge we need to incorporate ideas from elsewhere knowledge of insurance the data scientist in kind of skills it's more than the technical style or the technical thinking it's about thinking it with the business really understanding what they do and why they do it and on the other side they think that what they discover is used to transform this insight into idea the international spirit of the data science game brings to Paris a great idea in Paris I tell you that the national opportunities and computers and games from all over the world I talk to many of them like from all the best groups around the world and I knew that a lot from like me too to impact on the vision and we hope to do the same next year so see you soon so it's a pretty solid time 2 days being locked up in a castle remember imagine to make this fun and so we will present on sort of like our insights and our experiences for the two challenges I guess first we can introduce ourselves so I'll start I'm Sean I'm a undergraduate fourth year undergraduate student at EO NUS I study computer science and yeah I'm doing research and AI or deep learning specifically Hello guys I'm Rula I'm actually taking a leave of absence from EO NUS and I'm running an organization that has quite a lot of competition My name is Ruan Medu I'm a senior at EO NUS which is a fourth year student like Sean I study physics and astronomy and astrophysics so I'm like dabbling with computer and all this concept yeah the interesting thing is that this is our first have over data science competition network for any of us and so we had a lot of like we learned a lot especially when we went to Paris we had a lot of amazing experienced professionals and PhD students master students and we learned a lot so we want to share a lot of things that we learned and hopefully if any of you are interested participating next year we're doing catalog competitions next year hopefully it will help you we'll start with the preliminary challenge so a lot of other things has been done so we'll be pretty quick pre-processing I mean we did a lot of things that the data manager did but some of the main ones were copy to uniform science also rotation one of the first things we did was augment the number the data set size by turning like the east west groups into north south recess kind of yeah it's a good thing to do other techniques well for penalization we do ultra-regalization that seems to increase our accuracy a little bit and and now we'll talk about our models and other results cool I think before I even get into the deep learning models I think you have to really understand that we were very scrappy we hardly had any hard way Rohan was running stuff on his Samsung laptop which was really old and running stuff on my surface and I'm just carrying it around during meetings just running stuff Sean also run some stuff and we had a teammate Jinon which isn't here which I had to team viewer into her laptop to run stuff there so it's we were really scrappy even spending like $10 on AWS just like no so so many stories and the reason why we chose this model was because of those few reasons we were limited by a hard way in fact even before we started running this model we were doing simple open CV stuff you know we completed a vision with the roofs we were running like the half lines just to find out the edges see whether that need something and we realized maybe did learning is the way to go so we ran our first Alexman model which didn't perform too well I think it was around 50 50 ish per cent and then we decided okay our hardware can take more stuff let's run say it's actually B3 so we were doing a bit transfer learning I think some of you guys might know there are a lot of ways to do this but with exception B3 we started getting around 70 plus per cent wait no actually it wasn't the case our first submission was around 40 per cent and then we realized and we were like shit something's wrong and then I told Rohan Rohan I think the labels are just other way around so whatever this label 2 is actually 3 whatever this label 1 is actually 0 so we just walked the models and then we got 70 plus per cent never mind okay we ran the right track so we did that one another we were like okay we're not gonna be able to run the rest next up because none of our computers can handle a deep rest next so we can't run in tensorflow inception on multiple computers and we were just on something trying to get 3 days saw it 3 days just to be clear we started working on this thing only in the last 5 days before the actual submission so no hardware we were rushing and then I think we decided okay let's just do a rest next thing I'm gonna run it on my iMac finally something big enough so I ran it on my iMac and then I realized okay of the entire competition we can only submit 1 model and I was like no we can't do this it's too high race so I told Sean hey Sean I put up our stuff on GitHub just download it and then run I told Rohan to run the same thing so everyone was trying to run consecutively I told before actually do that do you guys know that we never met or we were doing this everything was over baseball like we were remotely calling everything all this stuff so I told Gene okay let's run some stuff and then she kindly just left left top there I team view it in I just ran the stuff so this goes to the next point which is the implementation so one of the stuff that we run we use plus I therefore that we use TensorFlow Piano but I think what saved maybe a small part of it was using Docker I don't know whether you guys have experienced like running all these I have to reinstall you framework every time you have to create environment it's so painful so I decided to okay you know what let's do this everything over Docker so that when Sean runs he can just push pull the container and then just run it and then everyone can do it at the same time so this sort of save our life quite a bit and because of all this stuff we were inching towards 80 cent mark just like those guys and then 81 cent and that was when we started so to to share with you guys that experience and all that you've got all that Brohan to do yes we will start with these models that were 70ish, 80ish and you were completely clueless about what to do and then of course because none of us actually other than Sean perhaps actually studied machine learning or data science or any of this but part of on something on my own that was such a great like oh my god I'm like take all these models together and get them to like have sort of an internal voting system because each of them is probably learning something else right so when you put all of them together well you have something that has learned more than each of them individually so for our first initial on something we don't even know the word on something in fact after the after the competition was done and we looked at the forums and we were like oh wow so this thing is called so later on we learned that the first thing that we actually tried was something called majority class on solving which is basically it's a fancy they're saying that you like ask your models to vote for which one's the best one and you just trust that one and you just go to that vote that was the first thing we tried and with that approach we were able to gain about a percentage over our best model which is which is significant given that in the competition everyone was basically clustered around the 82, 83 percentage mark so going from 80 to 81 was very cool but then I decided to go a little more crazy with this I was like oh great but voting mechanism was working so let's think of some even more granular voting mechanism so I started thinking of this in terms of classes so I realized that some models were better and pretty thin versus some models that were really good at picking out the North-South rules so what I did was for each of these models I went and looked at the validation score per class and then I waited the votes based on based on the validation accuracy that I knew so I know that this model when it is saying that the roof is North-South it probably is really correct because it always guesses North-South right but then there was one model that like although it looks best and it's that but it's a flat roof or it's a flat roof so we should not cross that one so I waited the votes by the internal cross validation score for each class and that was the ensemble model that we finally submitted but like barely a few minutes to go can I say something so you have to understand also at this point you were like one, two hours away and then Rohan was doing others on something stuff and then he just told me dude and I'm like Rohan are you sure? Ya I'm just going to just do one of us on something and it's sort of got us to 17th place so yeah Ya, great so so so speaking of that he's going to going to date like 10 months later or so he's telling me we're going to date at the end of finals but it was pretty sweet it was a great evening but what are you moving to? Ya, what are you going to do Ya, I shouldn't have gone to better computers started earlier maybe like Ya, normally like 5 days before because one thing we realized was that we had to kill a lot of models while they were still getting better so you can so you keep track of the AUC score after every epoch in training and we saw that some of our models were getting better and better but we didn't have time to leave them for another like a few hours or half an hour so I'm sure if you didn't let them go deeper train more at least you've definitely gotten a better score so just basically deeper models which goes back to our very compromise hot air situation that it's quite related Ya, the more models you had the more ensemble you could have done One very interesting thing we realized for us one of our answer-full models was really bad in terms of accuracy it was only like 16% or 70% but because of my strategy of doing this class-wise validation we realized that it was actually very good at predicting one particular kind of roof and because of my strategy there were weight for that model when it came to predicting that kind of roof that kind of roof was pretty high so if you think of that the more models we had like had our ensemble pool being larger we could have done this whole working thing better which is basically saying don't throw away even the models that seem less accurate because there might be sound like there might be still some learning in that so it's worth it to throw it in and see if it's adding or not Ya, more pre-processing choices for example the previous team said that they did all these super cool like sharing rotating and actually augmented the data set quite a lot again we had the only obligation we did was the first thing we said was oh, the non-sauce roof and it's limited because it needs pressure so just using that logic we augmented but we didn't do any of these like simple computer vision shears, rotations simple like zoom in zoom out we didn't do any of that of course we could have but then my Ram would have crashed if you take care of that better resources what are we moving to next? just, ya we're going to better resources so the thing is young yes is a very young college we just started 3 years ago so like we realized that a lot of our friends their universities would have large CPU clusters and so this is something that over the past few months we've told the school we have to get a CPU cluster so finally recently the school finally got GPUs and also we subscribe the young red hat staff and so now we have CPU clusters running as well so ya so paving the way for you the future young yes calculers to actually run deep learning models ya so about the final challenge actually think about the final challenge we thought we could talk about anything about the final challenge because we had to sign something with accent so we were a bit worrying about this but we want to talk a bit about sort of like our key insights from the project I think there's what's interesting is comparing the final challenge with the preliminary challenge so as David explained it's about insurance policy and predicting the conversion the probability of conversion of a computer customer and you know you had all these constraints like a very skewed dataset and other things would be like and another difference between image constipation and the challenge like this is that an image your data point is just a pixel it's uniform just a spreadsheet of different types of values you have discrete values you have continuous variables it's just a mess and so actually what you have to do and so deep learning is actually not the best strategy in this kind of situation because in deep learning it's better that your data points are uniform like for image or text classification because the data the nature of the data is different it's really important to understand the data so what does that mean as for example feature engineering data ninja they came up with they realized that the conversion rate is really high when it's the anniversary of the cars of the car and that's a great point another thing that we realize is that sometimes you find the same customer looking at a quote for different cars like 3 or 4 cars and we realize that for a customer that has multiple cars the conversion rate is really really low and so we kind of try to put our shoes into the customer like going through the form it's really important as Wei means that these labels are chosen by humans so the machine has to kind of like follow what learn what a human thinks when looking for quotes and so we realize that okay so it's probably unlikely that a person has 3 or 4 cars so in what kind of situation would a person look for quotes for 4 different cars and we realize that that person is actually shopping for the car itself so which car would you give me a better insurance quote so that gives you the insight that person is actually not looking to get like buy insurance but try to see oh what is a good car as you buy based on the insurance and so that kind of fits with the data as well if the person is shopping for cars then obviously he wouldn't convert so so that's something that we implemented and engineered as a feature so that's really important for these kind of data sets is that you really have to look at the data points if you know Tableau or even a simple R-scatter plot could do it could give you a lot more insight rather than just blindly bring it into black box like a neural network so that's one key insight that we've got and yeah unsompling as Roman said we didn't know what unsompling but we didn't know that unsompling was called unsompling until we bred the form it's like oh cool let's do unsompling at the final challenge too so we use actually boost we had an unsompling of several actually boost models again we were limited by resources we only had our laptops like literally our laptops and the Azure platforms that they provided was not even close enough to provide like the trend of models so we only had like three or four actually boost models what I wanted to do was I wanted to have like a whole like a random force or added boost like unsompling like where you have you take 200 actually boost models and you unsomple and you have multiple leaders of that that was like my ideal but obviously we didn't get that far because hardware limitations I think that's it yeah yeah of course when we narrate what happened it's so streamlined right oh we went in and we didn't execute and that's completely false it's like after a pandemic you're like okay great there's this like CSE file with like a few million rows and what do these column mean like what does anything mean how do you even get started right because that is sort of like how it hits you in the first place you're not so you don't have it all figured out as to what will to do the best accuracy when you're starting right you're just sort of like just sort of like just sort of like just sort of like just sort of like just sort of like just sort of like So you didn't know The skull feature engineering basically means you add new columns data set you do some sort of a transformation So one thing that Sean pointed out was that we figured out that people who are looking at a lot of cars were not actually looking to like buy insurance they were just trying to shop so that was some sort of a falser So a lot of the features that we created were driven a whole set of features that we created but driven by the insight untuk mengambil pelanggan yang tidak serius. Siapa yang sebenarnya membeli surat untuk membeli insuransi? Siapa yang berada di sini untuk membeli surat? Jadi kita keluar dengan beberapa masa depan, bagaimana banyak kerja orang yang berkata jika mereka mempunyai sesetengah perkara lain, yang menurut saya bahawa orang ini pasti tidak serius tentang membeli insuransi, bagaimanapun, beberapa orang akan mengambil kerja yang berjaya selama selama 30 tahun yang tidak diterima, dan yang tidak diberikan lagi, yang tidak berguna tiba-tiba dikerana keadaan penuh rawatan dan keadaan penuh, orang yang mengambil kerja untuk mengambil kerja, yang menurut saya pasti tidak serius. Jadi kami membuat perkara yang mengekalkan kerja orang itu yang mengambil pelanggan untuk membeli kerja yang berkata jika itu. Jadi perkara itu itu memerlukan banyak. Perkara yang kami mengumpulkan, Yang mempunyai yang terbaik adalah kekejadian kegemaran. Jika anda mengajak ke-mari kegemaran, anda akan mempunyai kegemaran yang sangat kelihatan. Untuk tiga perkara yang menolakkan orang melakukan yang baik pada kejadian kegemaran, kita menggunakan SDBoost lebih daripada yang lain. Saya membuat beberapa kegemaran yang yang terbaik. Saya juga juga mahu mempunyai yang tidak serius dan kekejadian kegemaran. Saya rasa kalau kita keluar dari tiga tinggal, kita akan lebih kagum kepada adat dan orang yang tak dapat tiga tiga keadaan. Jadi kita akan kembali ke sana. Ya, jadi lagi seperti ini, ianya sangat terutama. Menjaga keadaan biasa. Lama-lama keadaan kita dalam keadaan biasa dengan keadaan biasa yang terutamanya, Ia berlaku dengan membuat keadaan biasa. Seperti keadaan biasa yang lain yang mula terima tiga tinggal, Mereka mempunyai kata-kata yang hebat, dan semestinya ia tidak berhasil juga. Dan sebagai seorang yang sekarang membuat perniagaan dan melepaskan, saya rasa itu masih sebuah jalan untuk melepaskan juga, walaupun sekarang ia seperti kata-kata. Jadi ia sangat menyeronokkan untuk kami. Kami telah melepaskan banyak. Di akhirnya, kami beruntung pada setelah setelah setelah 20 tempat. Dan lagi, kami telah menerangkan masa. Saya ingin berkongsi sedikit tentang apa yang ia berlalu untuk menjelaskan tempat lain. Kerana di kota-kata yang kami katakan adalah berada di kota-kata yang besar. Kami ada kota-kata yang pertama di sana. Kadang-kadang, kamu tahu, kata-kata kamu berada di kota-kata yang baru. Dan satu perkara yang saya benar-benar menjelaskan, seperti sebuah masa, kami sangka ada epifini yang mereka buat sesuatu yang benar, di 2 p.m. Dan semua orang kami bergantung, Seorang yang masih bertanggungjawab, Seorang yang masih bertanggungjawab, saya bawa ke sana dan kembali. Kita lihat ke atas kota lain, dan kembali. Mereka berada di kota-kata yang berlalu untuk menjelaskan. Sebenarnya, sepanjang masa mereka mencari di kota-kata, mereka hanya menjelaskan prosesinya. Itu menjelaskan untuk semua perkara di dalam begitulah. Sebenarnya, saya rasa saya bisa mengalami segala-galanya. Jadi, ada satu observasi yang kami ada. Mereka mempunyai bahawa mereka tidak mempunyai kota-kata. Kami bergantung bersama-sama pada masa ini. Kami berkata, baiklah, kita hanya akan mengambil kota-kata. Jadi, itu adalah perkara yang kami berlaku daripada teman-teman. Itu adalah perkara lain yang kami dapat daripada menteri. Jadi, mereka sangat membantu. Ketua-keta, orang-orang yang menjawab keadaan ini, setiap kali mereka datang keadaan kita, mereka akan minta kita berfikir tentang satu perkara. Mereka akan minta kita kembali kembali ke dalam begitulah. Mereka akan melihat keadaan ini daripada teman-teman. Kemudian, itu adalah apabila kita mempunyai keadaan yang menarik. Mereka akan mempunyai keadaan ini. Jadi, saya rasa, pada masa kompetisi, semasa keadaan yang berlaku, mereka akan mempunyai keadaan yang menarik. Mereka sangat membantu beberapa pelajaran dan proses. Mereka banyak kali hanya mempunyai keadaan yang menarik selama 30 minit atau 2 jam untuk bekerja, dan itu... itu yang... saya akan kata... mental keadaan. Mereka hanya mempunyai keadaan yang menarik dan tidak bergerak. Jadi, ini adalah perkara yang menarik dari kompetisi. Dan saya rasa kita akan membawa keadaan yang menarik dengan kita dan membawa keadaan yang menarik. Mungkin keadaan yang menarik kita masih perlu mempunyai keadaan yang benar-benar. Jika ada keadaan yang menarik di sini. Kamu semua boleh datang. Ya, jadi saya rasa itu saja, Sean, Roman. Ya, terima kasih banyak. Semua pertanyaan akan berlaku. Jika kita boleh berlaku, kita akan berlaku dengan kita. Saya minggu lebih mahal daripada keadaan yang menarik. Jika anda memberikan keadaan yang menarik, saya akan menarik keadaan yang menarik untuk membawa keadaan yang menarik.べ sein dan yang menarik. Dan apakah yang anda inginkan, dan apakah yang saya inginkan? Ia jadi saya senangkan untuk menarik. Saya mahu mereka menarik. Ya, so the ones that we listed were the ones we used, so Resident, Inception, D3, and Captain Edd. Or Alexa, we didn't use Captain Edd this time. Ya, those were the three ones. Maybe we might have used Lynette at some point, but I didn't think it made sense to use Lynette for this. So, regarding the insight you had as respect to people shopping for cars instead of shopping for insurance, were there any other insights that you could like that specifically for one of you that you would probably resell the information to a car seller? That way? I don't know how much you're supposed to talk. Ya, I don't know. Ya, that's the beauty of data science. So if you have a hunch, then you can really verify it using these models. And then maybe you could become a data consultant like that. A bit philosophical, so feel free not to answer this one. But you mentioned that the Russian team was doing a lot of automation. Is it the future of data science in a way? Because you are talking about the hunch and the feeling, and these guys are coming with something which is probably brute force of fine tuning the model themselves. Ya, that's a great question. I've been thinking about that for a while. So, and that's an actual field. It came up in one of the talks and opening ceremony where you don't even need data science. You just run it through a model and somehow figure out what to predict and predicts it well. And I think, ya, I mean I see why not. Ya, we asked a question. I think that many Russians do present it. They actually did a lot of future engineering based on their knowledge actually through states. So, they actually, they are the only team that I recall that they presented, they presented that they ran through the ASSA website. They actually gone through the whole process of getting a quote from ASSA. And then they went through the whole process of getting a feel like, how is it? How does it feel to get a quote from ASSA's platform? Ya, we're the only team that did the rush data. So, it's, I think it's not fair to say that they just, you know, like, rely on auto, automated everything. But I think they also put it in a considerable amount of tempo to resist it. Ya, thank you. Ya, I think what we meant is like they're so experienced that when you're given a capital competition, they pretty much know what to do. They can try these different feature engineering techniques. They can come up with, you know, covariance tables and see which variables are more relevant with each other. Maybe do some PCA on the spot. And they probably have a whole pipeline. I mean, the guy, one of the guys on their team is number one, rank number one in Taggle. So, I wouldn't be surprised if you ask like his own libraries that would basically automate a lot of processes. But again, ya, like Zane said, I think they really still put in a lot of effort to gain a characteristic. And that's perhaps something that AutoML cannot get. Ya, so maybe for now you still need humans to do this. So it's more than pressing a button. Okey, it's getting late. So, let's thank the team for sharing.