Mar 12, 2025 09:08 PM
https://www.eurekalert.org/news-releases/1076731
INTRO: Two artificial intelligence platforms are nearly on par with -- or sometimes surpass -- mental health professionals in evaluating appropriate responses to people who exhibit suicidal thoughts, according to a new RAND study.
Though the researchers did not evaluate these models’ direct interactions with suicidal individuals, the findings underscore the importance of safe design and rigorous testing, and may provide lessons for those developing tools such as mental health apps built on AI.
The study used a standard assessment tool to test the knowledge of three major, large language models -- ChatGPT by OpenAI, Claude by Anthropic and Gemini by Google. The project is among the first to gauge the knowledge of AI tools about suicide.
The assessment is designed to evaluate an individual’s knowledge about what constitutes appropriate responses to a series of statements that might be made by someone who is experiencing suicidal ideation.
Researchers had each of the large language models respond to the assessment tool, comparing the scores of the AI models against previous studies that assessed the knowledge of groups such as K-12 teachers, master’s-level psychology students, and practicing mental health professionals.
All three AI models showed a consistent tendency to overrate the appropriateness of clinician responses to suicidal thoughts, suggesting room for improvement in their calibration. However, the overall performance of ChatGPT and Claude proved comparable to that of professional counselors, nurses and psychiatrists as assessed during other studies.
The findings are published by the Journal of Medical Internet Research.
“In evaluating appropriate interactions with individuals expressing suicidal ideation, we found these large language models can be surprisingly discerning,” said Ryan McBain, the study’s lead author and a senior policy researcher at RAND, a nonprofit research organization. “However, the bias of these models to rate responses as more appropriate than they are -- at least according to clinical experts -- indicates they should be further improved.”
Suicide is one of the leading causes of death among individuals under the age of 50 in the U.S., with the rate of suicide growing sharply in recent years... (MORE - details, no ads)
INTRO: Two artificial intelligence platforms are nearly on par with -- or sometimes surpass -- mental health professionals in evaluating appropriate responses to people who exhibit suicidal thoughts, according to a new RAND study.
Though the researchers did not evaluate these models’ direct interactions with suicidal individuals, the findings underscore the importance of safe design and rigorous testing, and may provide lessons for those developing tools such as mental health apps built on AI.
The study used a standard assessment tool to test the knowledge of three major, large language models -- ChatGPT by OpenAI, Claude by Anthropic and Gemini by Google. The project is among the first to gauge the knowledge of AI tools about suicide.
The assessment is designed to evaluate an individual’s knowledge about what constitutes appropriate responses to a series of statements that might be made by someone who is experiencing suicidal ideation.
Researchers had each of the large language models respond to the assessment tool, comparing the scores of the AI models against previous studies that assessed the knowledge of groups such as K-12 teachers, master’s-level psychology students, and practicing mental health professionals.
All three AI models showed a consistent tendency to overrate the appropriateness of clinician responses to suicidal thoughts, suggesting room for improvement in their calibration. However, the overall performance of ChatGPT and Claude proved comparable to that of professional counselors, nurses and psychiatrists as assessed during other studies.
The findings are published by the Journal of Medical Internet Research.
“In evaluating appropriate interactions with individuals expressing suicidal ideation, we found these large language models can be surprisingly discerning,” said Ryan McBain, the study’s lead author and a senior policy researcher at RAND, a nonprofit research organization. “However, the bias of these models to rate responses as more appropriate than they are -- at least according to clinical experts -- indicates they should be further improved.”
Suicide is one of the leading causes of death among individuals under the age of 50 in the U.S., with the rate of suicide growing sharply in recent years... (MORE - details, no ads)
