 Hi everyone. I am Vinita and from India. My Outreachy project is about automatically detecting spam board registration using machine learning techniques, something similar to the Google Invisible Capture. My mentors are Gargot Issa and Adam Ross is White. I had two amazing mentors. We have seen the Google Invisible Capture at many places while using many websites where when you click on a small box, they can make sure we are a robot or a human being. Initially, Google had CAPTCHAs which displayed text. Right now, while creating an account in Wikipedia, this CAPTCHA is shown which is a text which is distorted. Humans find it very difficult to understand the words written in the image because it is distorted. But we find that many bots can crack this CAPTCHA easily by employing the technical advances in OCR, etc. Our aim is to create a CAPTCHA system, something similar to the Google Invisible Capture which is very convenient for humans and at the same time it can prevent the bots from creating accounts. We do this by building a machine learning model. Our constraint here is that we cannot use private data which can recognize the user. Instead, we have to maintain the user anonymity while creating the features or while recording the data which the user uses while creating the account. The first phase included creation of the system where we can collect the data. You must have seen this in Wikipedia recently in account creation page. We used two basic features and variations of it. We depend on the key press information like users will type the username. There will be some features which are displayed by legitimate human users and some features which are shown by bots. We didn't use the raw data because that will make the identity of the person clear. We used certain statistics of key press data such as a mean, variance, kurtosis, etc. Also, we used the mouse movement data information also. The details of the features can be seen in the schema. We started collecting data from February. While observing the data, we found that most bots were active within three days of registering. Today, if you create a bot account, the sleep duration of the bots was not much. 70% of the bots started spamming within three days and were blocked within three days. This is important to us because we can see the results of our experiments fast. The bots slept for longer duration will be difficult for us. Another interesting observation is that 90% of the bots have been blocked within 18 days of registration. Also, we had information about the accounts registered using the APIs and using mobile registrations. We found that the number of such bot registrations using APIs and mobile registrations were very negligible when compared to the accounts created using the web interface. In order to create the machine learning model, we had collected a few data in February. We got 126 bot data samples. We had obtained a large number of human data samples but we used 183 of them. We used an XGBoost classifier to create a machine learning model. It gave an accuracy of 58%. This is significant because we were using the statistical data as features. We couldn't use very obvious features because of privacy concerns. But this is very promising because now we have thought of ways in which instead of using an aggregate feature, we can use field-wise features like using the username fields feature and the email address fields features separately. Even the capture fields features. That should give a better feature representation. We have ideas to improve the performance of the model and we will be working for it. Thank you. That's all what I have done in this outreach work. This was a research sort of work and there is scope for more improvement for improving the model. Thank you.