 This study evaluated CHAT GPT's capacity for ongoing clinical decision support by inputting all 36 published clinical vignettes from the Merck-Sharp and Dome, MSD, clinical manual into CHAT GPT, and comparing its accuracy on differential diagnoses, diagnostic testing, final diagnosis, and management based on patient age, gender, and K-security. The LLM demonstrated the highest performance in making a final diagnosis with an accuracy of 76.9%, and the lowest performance in generating an initial differential diagnosis with an accuracy of 60.3%. CHAT GPT achieved an overall accuracy of 71.7% across all clinical vignettes, with limitations, including possible model hallucinations, and the unclear composition of CHAT GPT's training dataset. This article was authored by Arya Rao, Michael Pang, John Kim, and others.