 Our proposed methodology uses a combination of pre-trained language models and tokenizers to identify malicious code from Benign Code. We first extract the top ranked files from the dataset, which contain the most relevant information. Next, we pre-tokenize these files and pass them into a BERT-based classifier. This classifier is able to accurately distinguish between malicious and Benign Code, with a weighted F1 score ranging from 82% to 99%. This article was authored by Abir Rahali and Malay A. Aklifi.