ChatGPT passed the Turing test and was mistaken for a human
A new study has shown that modern artificial intelligence models are capable of convincingly mimicking human behavior in conversation, and in some cases, they are difficult to distinguish from real people. The most advanced version of ChatGPT was perceived as human in 73% of cases in the Turing test, which significantly exceeds expectations, according to The Independent.
Artificial intelligence is increasingly confidently passing classic “humanity” tests and no longer simply answers questions but constructs conversations in such a way that people begin to doubt exactly who they are talking to. New research has shown that modern language models can sometimes convince a conversation partner that they are talking to a real person, and they do so quite successfully.
The Turing Test, proposed by mathematician Alan Turing in the 1950s as a “game of imitation,” involves a simple scenario: a person communicates with an unknown interlocutor and tries to determine whether it is another person or a machine. In a new study published in the journal Proceedings of the National Academy of Sciences, several large language models were tested, including various versions of ChatGPT and Meta’s LLaMa, and compared with responses from real people.
The results were revealing: participants identified the latest version of GPT-4.5 as human in 73% of cases, which is even higher than the rates of human-to-human interaction in some of the tests. The LLaMa model scored 56%, while GPT-4o, released in 2024, was perceived as human in only 21% of cases. By comparison, even the ancient ELIZA system from the 1960s scored 23%, highlighting the complexity of modern assessments.
Study author Cameron Jones explained that the model’s communication style and the right prompts, which define its style and role, play a key role. “We found that with the right prompts, state-of-the-art LLMs can exhibit the same tone, directness, humor, and tendency to make mistakes as humans,” he noted.
The researchers also emphasize that without specific instructions and scenarios, models are much more likely to “give themselves away” and are easily recognizable as artificial intelligence. For example, without additional prompts, GPT-4.5 was perceived as human in only 36% of cases, which significantly lowers the persuasiveness score.
Researchers believe that these results demonstrate not only technological progress but also the complexity of human communication itself, which is becoming increasingly difficult to formally distinguish from machine-generated communication.
As a reminder, Nvidia CEO Jensen Huang stated that the development of AI will not lead to mass unemployment but, on the contrary, will contribute to the creation of new jobs.
Scientists have developed AI to predict obesity-related diseases.