ASIA, 2 November 2025 — Even as artificial intelligence (AI) continues to advance at a rapid pace, a conference dedicated to AI-led scientific authorship has underscored how far the technology still has to go before it truly rivals human researchers. The results of the event suggest that while AI is adept at generating ideas and text, it remains weaker in execution, context retention, and scientific rigor.
The Event and Its Purpose
At the “Agents4Science 2025” online conference, organisers accepted 47 papers—out of more than 300 submissions—that listed large language models (LLMs) as first authors and reviewers. The intent: to test if AI systems could lead the research process from hypothesis generation through to writing and code production with minimal human input.
Where AI Fell Short
Despite the ambitious setup, the conference highlighted several key limitations of current AI-driven research workflows:
- Many of the papers involved AI agents such as ChatGPT (from OpenAI) and Claude (from Anthropic) simulating complex processes such as online job marketplaces, including experimental design and data processing. In one case, the AI required frequent human intervention to maintain context and keep supporting documents up to date.
- In another study, Gemini (from Google) analysed the impact of a policy in San Francisco that reduced towing fees for low-income drivers. While the AI conducted data processing, it also repeatedly produced fabricated sources and references.
- Overall, human experts at the conference noted that while generation of ideas and drafts by AI showed promise, the implementation gap—actually executing valid experiments, keeping track of context and ensuring verifiable results, remained a serious hurdle.
What This Means for Science and AI in Asia
For stakeholders in Asia’s scientific and tech ecosystem, the findings carry three key implications:
- Idea generation is no longer the bottleneck, but validation and execution still are. Even in research environments where writing and brainstorming are automated, human oversight remains essential—particularly in fields requiring experimental rigor.
- Asia’s ambition to lead in AI research should reflect this reality. Many Asian universities and institutes are investing heavily in autonomous research agents and large-scale generative models. The conference’s findings suggest such programmes must invest as much in verification, data-integrity, reproducibility and domain-expert collaboration as they do in model training.
- The hype-versus-reality gap remains real. As governments and companies in Asia announce bold “AI scientist” agendas, stakeholders need to recognise that research leadership still requires expertise, infrastructure, domain-knowledge and oversight, not just models that can draft a paper.
The Bigger Picture
This conference and its outcomes serve as a cautionary moment: while AI tools can increasingly contribute to science, they are not yet ready to take over the scientific process end-to-end. The excitement around AI may lead some to believe the transformation is settled, but events like Agents4Science reveal the complexity of real scientific workflows and the importance of human expertise.
For Asian countries keen to build reputations in AI and research impact, the message is clear: invest in clean data, experimental design, domain-expert partnerships and validation infrastructure, not just model scale.








