The American Medical Association (AMA) urged federal lawmakers Thursday to strengthen safeguards towards artificial intelligence chatbots as use increases for mental health.
The organization penned letters to the co-chairs of the Congressional Artificial Intelligence Caucus (PDF), the Congressional Digital Health Caucus (PDF) and the Senate Artificial Intelligence Caucus (PDF). While the organization recognized lawmakers efforts towards “advancing conversations about AI’s role in society and mental health,” it said the rise of mental health chatbots, including reports of encouraging self-harm and privacy breaches, “highlights the urgent need for clear guardrails.”
Safeguards recommended by the AMA include:
- Enforce transparency standards and penalize deceptive practices, including systems presenting themselves as licensed clinicians
- Create a modern, risk-based oversight framework and clarify when AI tools qualify as medical devices
- Mandate ongoing safety monitoring and adverse event reporting
- Require strict data protection standards
The AMA uses the term “augmented intelligence” when referring to AI to emphasize the technology's assistive role in medicine.
“AI-enabled tools may help expand access to mental health resources and support innovation in health care delivery, but they lack consistent safeguards against serious risks, including emotional dependency, misinformation, and inadequate crisis response,” said John Whyte, M.D., AMA CEO, in a statement. “With thoughtful oversight and accountability, policymakers can support innovation and ensure technologies prioritize patient safety, strengthen public trust, and responsibly complement—not replace—clinical care.”
A March Rock Health survey found 32% of respondents use AI chatbots to seek out health information, with 28% of AI users reporting that they turn to chatbots to help manage mental health or stress. Despite the increasing reliance, Mass General Brigham researchers found publicly available generative AI models often fail to properly navigate diagnostic situations.
While all 21 large language models (LLMs) analyzed in the study achieved a correct final diagnosis more than 90% of the time, all models failed to produce an appropriate differential diagnosis more than 80% of the time, which researchers say emphasize AI models should “augment—not replace—physician reasoning.”