Early signs of cognitive decline often appear not in a formal diagnosis but in small clues buried in a health care provider’s notes.
A new study published January 7 in the journal NPJ Digital Medicine suggests that artificial intelligence (AI) could help identify early signs of problems such as memory or thinking problems or changes in behavior by scanning doctors’ notes for concerning patterns. These may include repeated mentions of cognitive changes or confusion in the patient, or concerns mentioned by family members attending meetings with their loved one.
you may like
“The goal is not to replace clinical judgment, but to act as a screening aid,” study co-author Dr. Lydia Moura, associate professor of neurology at Massachusetts General Hospital, told Live Science. By highlighting such patients, the system could help clinicians decide which patients to follow up with, especially in settings where there is a shortage of specialists, she said.
Whether this type of screening actually helps patients depends on how it is used, said Julia Adler Milstein, a health informatics scientist at the University of California, San Francisco, who was not involved in the study. “If the flag is accurate, it contacts the appropriate person on the care team, and it is actionable, meaning it leads to a clear next step, then yes, it can be easily integrated into the clinical workflow,” she told Live Science via email.
AI agents are made up of teams, not just one person
To build the new AI system, the researchers used what they called an “agentic” approach. The term refers to a coordinated set of AI programs (in this case five), each with a specific role and reviewing each other’s work. These collaborative agents worked together to iteratively refine the way the system interpreted clinical notes without human input.
The researchers built a system based on Meta’s Llama 3.1 and studied three years of physician records, including office visits, progress notes, and discharge summaries. These were obtained from hospital registries and had already been reviewed by clinicians who noted whether cognitive concerns existed in a particular patient’s medical record.
The researchers first showed the AI a balanced set of patient notes, including those with documented and undocumented cognitive concerns, and tried to match how clinicians had labeled those notes, forcing the AI to learn from its mistakes. By the end of that process, the system achieved clinician buy-in about 91% of the time.
The completed system was then tested on another subset of never-before-seen data extracted from the same three-year dataset. The second dataset was intended to reflect real-world care, so only about one-third of records were labeled by clinicians as indicating cognitive concerns.
In this test, the sensitivity of the system decreased to approximately 62%. This means that clinicians missed nearly 4 in 10 cases marked as positive for signs of cognitive decline.
you may like
At first glance, this drop in accuracy appeared to be a failure, until researchers reexamined medical records that had been classified separately by AI and human reviewers.
Clinical experts reviewed these cases by reviewing the medical records themselves, without knowing whether the classification was made by the clinician or the AI. In 44% of cases, these reviewers ultimately supported the system’s assessment rather than the physician’s original chart review.
“This was one of the most surprising findings,” said study co-author Hossein Estiri, associate professor of neurology at Massachusetts General Hospital.
In many of these cases, he said, AI applied clinical definitions more conservatively than doctors, refusing to raise concerns if the records did not directly describe memory loss, confusion, or other changes in the patient’s thinking, even if the diagnosis of cognitive decline was listed elsewhere in the records. The AI was trained to prioritize mentions of potential cognitive concerns that aren’t necessarily important to doctors at the moment.
The results highlight the limitations of manual chart reviews by physicians, Moura said. “If the signal is clear, everyone will recognize it,” she said. “In subtle cases, this can be the turning point between humans and machines.”
Karin Verspool, an AI and medical technology researcher at RMIT University who was not involved in the study, said the system was carefully curated and evaluated based on a set of doctors’ records reviewed by clinicians. However, she cautioned that because the data comes from a single hospital network, its accuracy may not be reflected in settings where documentation practices differ.
He argued that the system’s scope is limited by the quality of the notes being read, a limitation that can only be addressed by optimizing the system in diverse clinical settings.
Estilli explained that for now, the system is intended to run quietly in the background of routine doctor visits, surfacing potential concerns with an explanation of how they were reached by the doctor. However, it has not yet been used in clinical practice.
“The idea is not that doctors are sitting there and using AI tools, but that the system provides insight into what we’re looking at and why we’re looking at it, as part of the clinical record itself,” he said.
Tian, J., Fard, P., Cagan, C. et al. Autonomous agent workflow for clinical detection of cognitive concerns using large-scale language models. npj Number. medicine. 9, 51 (2026). https://doi.org/10.1038/s41746-025-02324-4
Source link
