 The Basal Ganglia, BG, are responsible for reinforcement learning, RL, decision-making, and risk-taking. They use complex circuitry and dynamic dopamine modulation to achieve these tasks, which differs from artificial RL agents. To assess the normative advantages of this circuitry, we developed the OPAL model. It uses opponent pathways to differentially emphasize the history of positive or negative outcomes for each action. Dynamic dopamine modulation then amplifies the pathway most tuned for the task environment. This efficient coding mechanism avoids a vexing Explore-Exploit trade-off that plagues traditional RL models in sparse reward environments. OPAL exhibits robust advantages over alternative models, particularly in environments with sparse reward and large action spaces. These advantages depend on opponent and non-linear heavy and plasticity mechanisms previously thought to be pathological. Furthermore, OPAL captures risky choice patterns arising from DA and environmental manipulations across species, suggesting that they result from a normative biological mechanism. This article was authored by Alana Jaskier and Michael J. Frank. We are article.tv, links in the description below.