 This study evaluates the performance of two large language models, LLMs, GPT-3.5 and GPT-4, on passing the mere medical examination for access to medical specialist training in Spain. The results show that GPT-4 outperformed GPT-3.5 with an 86.81% correct response rate in Spanish, while English translations had a slightly enhanced performance. GPT-4 achieved a 100% correct response rate in several areas but showed lower performance in pharmacology, critical care, and infectious diseases specialties. The error analysis revealed that the gravest categories had a 0% error rate, while a 13.2% overall error rate existed. Despite the high success rate, understanding the error severity is critical for patient safety in real-world medical practice. This article was authored by Francisco Guillengrima, Sara Guillenaganaga, Laura Guillenaganaga, and others. We are article.tv, links in the description below.