When asked to evaluate how good we are at something, we tend to get it completely wrong. This is a universal human tendency, and its effects are strongest among those with lower levels of ability. This phenomenon, called the Dunning-Kruger effect after the psychologist who first studied it, means that people who are less good at a particular task become overconfident, while those who are more capable tend to underestimate their skills. It is often revealed through cognitive tests that include questions that assess attention, decision-making, judgment, and language.
But now scientists at Finland’s Aalto University (along with collaborators in Germany and Canada) have found that by using artificial intelligence (AI), the Dunning-Kruger effect can be largely eliminated, and in fact almost reversed.
you may like
As we all become more AI literate thanks to the proliferation of large-scale language models (LLMs), the researchers hoped that participants would not only become better at interacting with AI systems, but also better at judging their performance when using them. “Rather, our findings reveal that it is significantly impossible to accurately assess individual performance when using AI evenly across our sample,” report co-author Robin Welsh, a computer scientist at Aalto University, said in a statement.
flatten the curve
In the study, scientists gave 500 subjects a logical reasoning task on a law school entrance exam and allowed half of them to use the popular AI chatbot ChatGPT. Both groups were then asked questions about their AI literacy and how well they thought they were performing, and were promised additional rewards if they accurately assessed their performance.
The reasons behind the findings vary. Because AI users are typically satisfied with the answer to just one question or prompt and accept the answer without further checking or confirmation, they are engaging in what Welsh calls “cognitive offloading,” or asking questions with less self-reflection and approaching them in a more “shallow” way.
Reducing our involvement in our own reasoning, known as “metacognitive monitoring,” bypasses the normal feedback loops of critical thinking and reduces our ability to accurately measure our performance.
What’s also clear is that, regardless of intelligence, we all overestimate our abilities when using AI, and the gap between high and low skilled users is narrowing. This study attributes this to the fact that LLM helps everyone improve their performance to some extent.
Although the researchers did not address this directly, the discovery also comes at a time when scientists are beginning to question whether LLM in general is too sycophantic. Aalto’s team warned that there are several potential implications as AI becomes more widespread.
First, overall metacognitive accuracy may be reduced. Relying on results without rigorously questioning them improves user performance, but the tradeoff is that their evaluation of how well they can handle the task decreases. Without reflecting on results, error-checking, and reasoning more deeply, we risk reducing our ability to reliably obtain information, the scientists said in the study.
Furthermore, the flattening Dunning-Kruger effect means that we will all continue to overestimate our abilities when using AI, and those with higher AI literacy will be more likely to do so, leading to an increased propensity for poor decision-making and skill decline.
One way research suggests to stop this decline is for the AI itself to prompt users to ask more questions, and for developers to redirect answers to encourage reflection, literally asking questions like “How confident are you in this answer?” or “What did I miss?” or encourage further interaction through measures such as confidence scores.
The new research adds weight to the idea that AI training should include critical thinking as well as technical competency, as recently advocated by the Royal Society. “We…provide recommendations for the design of conversational AI systems that enhance metacognitive monitoring by allowing users to critically reflect on their performance,” the scientists said.
Source link
