 The study investigates the use of semi-supervised natural language processing, NLP, of electronic health records, EHR, free text information combined with structured EHR data to improve the discovery and treatment of background non-valvular atrial fibrillation, NVAF, a major contributor to stroke but significantly undiagnosed and undertreated despite explicit guidelines for oral anticoagulation. The study found that the structured plus unstructured method would have identified 3,976,056 additional true NVAF cases and improved sensitivity for CHA2DS2 VASC and has blood scores compared with the structured data alone, causing a 32.1% improvement. The study also found that this method would prevent an estimated 176,537 strokes, save 10,575 lives, and save US $3.5 billion. This article was authored by Peter L. Elkin, Sarah Mullen, Jack Mardekian, and others.