Particle.news

Download on the App Store

AI Models GPT-4.5 and LLaMa-3.1 Achieve Milestone in Rigorous Turing Test

OpenAI's GPT-4.5 reached a 73% success rate, surpassing human participants, while Meta's LLaMa-3.1 scored 56%, raising questions about the Turing Test's relevance and societal implications.

GPT 4.5 passes the Turing Test, blurs human-AI lines.
Stock image: A close-up of a smartphone displaying the ChatGPT logo on a white screen, with the same ChatGPT logo shown on a laptop screen on February 19, 2025 in Chongqing, China.
AI digital face of a human
AI’s future lies in practical utility—solving problems, not just being a smart conversationalist. (Getty Images/iStockphoto)

Overview

  • Researchers at UC San Diego conducted a three-party Turing Test where participants conversed with both AI and human counterparts to determine which was human.
  • GPT-4.5, when prompted with a specific persona, was judged to be human 73% of the time, outperforming actual human participants; LLaMa-3.1 achieved a 56% success rate under similar conditions.
  • The study highlights the importance of persona prompts in enhancing AI's ability to mimic human behavior, with models performing significantly worse without such guidance.
  • Critics argue that the Turing Test measures conversational mimicry rather than true intelligence, as these models lack comprehension or consciousness.
  • The findings, published as a preprint awaiting peer review, have sparked concerns about societal impacts, including job automation, social engineering risks, and ethical challenges.