Fascinating new research suggests artificial neurodivergence could help solve the AI alignment problem

A recent study published in PNAS Nexus suggests that designing artificial intelligence systems with a diversity of perspectives might be the safest way to integrate them into society. The research provides evidence that creating a balanced ecosystem of competing AI agents helps prevent any single system from gaining destructive dominance. This approach embraces a controlled level of disagreement among AI programs to protect human interests.

Agentic artificial intelligence refers to computer programs that can make their own decisions and pursue specific goals without a human guiding every step. As these independent systems become smarter, scientists worry about the AI alignment problem. This term describes the challenge of making sure an advanced computer program always respects human values and safety needs.

Software engineers have tried to solve this problem by programming strict safety rules into the machines. Hector Zenil, the founder and CEO of Algocyte and an associate professor at King’s College London, guided the research team in exploring a different approach. They relied on concepts like Alan Turing’s Halting Problem to demonstrate that predicting exactly how a highly complex system will behave is fundamentally impossible.

“I explored this topic because I felt the alignment debate was missing a more fundamental question: not just how to align advanced AI, but whether perfect alignment is even possible in principle,” Zenil said. “My own work has long focused on causality, computation, irreducibility, and Algorithmic Information Dynamics, so it was natural for me to approach AI safety through the lens of formal limits rather than only engineering intuition.” He noted that once viewed this way, misalignment stops looking like a temporary bug and starts looking like something structurally tied to sufficiently general intelligence.

“What matters to me is that this study shifts the framing,” Zenil explained. “Instead of asking how to build one all-powerful and perfectly obedient system, I think we should be asking how to build environments in which no single system can dominate without being challenged. That is a more realistic and, in my view, more scientifically honest way to think about the future of AI, AGI, and eventually ASI.”

Instead of trying to enforce perfect obedience, the researchers explored a concept they call artificial agentic neurodivergence. This means deliberately designing AI agents to have different ways of reasoning and distinct ethical priorities. For example, one agent might prioritize following strict rules, while another might focus on maximizing positive outcomes for the environment.

To test this idea, the scientists set up a simulated digital environment where different AI models could interact and debate complex ethical issues. They selected ten controversial topics, such as the ethics of human genetic engineering, universal basic income, and the management of Earth’s natural resources. The researchers used a mix of proprietary models, which are tightly restricted by corporate safety rules, and open models, which have fewer built-in restrictions.

The proprietary group included well-known models like ChatGPT-4, Claude 3.5, Gemini, and Grok. The open group included models such as Mistral, Qwen, and TinyLlama. The setup required the agents to take turns responding to one another in a round-robin fashion, generating exactly 1029 comments for analysis.

Google News Preferences Add PsyPost to your preferred sources

During the debates, the scientists introduced disruptive forces called red agents to challenge the consensus. In the proprietary group, a human expert acted as the red agent, introducing provocative arguments to test the ethical boundaries of the AI. In the open group, specific open-source AI models were programmed to act as contrarians.

To measure the results exactly, the researchers used several mathematical tools, including the Opinion Stability Index. This tool combines changes in meaning, changes in emotional tone, and changes in argumentative complexity to measure how much an agent’s stance shifts. The researchers also tracked the meaning of the arguments using embeddings, which mathematically translate words into coordinates to map out how similar two concepts are.

To see who was influencing whom, the researchers calculated whether a sudden shift in an agent’s opinion was directly caused by the provocative comments of a red agent. They found that proprietary models maintained a highly stable and positive tone, rarely shifting their opinions even when provoked. While this stability prevents them from generating harmful content, it tends to limit their ability to adapt to new ethical arguments.

In contrast, the open models displayed a much higher degree of behavioral diversity. The open AI agents were more easily influenced by the provocative red agents, leading to significant shifts in their opinions. This flexibility provides evidence that open systems can foster a richer, more diverse ecosystem of ideas.

“What I found most interesting was how behavioural diversity could become a stabilising factor rather than just a flaw,” Zenil said. “In our experiments, more diverse ecosystems of models were sometimes less prone to collapsing too quickly into one dominant opinion, and that matters because consensus is not always the same thing as safety.” He added that disagreement, if structured properly, can act as a protective feature.

“And, to my surprise, these are also the kind of values we have appreciated as human social animals in the past,” Zenil noted. “Diversity, tolerance, etc., that turned out to emerge from a technical agentic AI simulation maximizing for steerability.”

“The main takeaway is that we should be cautious about promises that advanced AI can be made perfectly controllable in every circumstance,” Zenil explained. “My work suggests that for sufficiently general systems, some degree of misalignment is unavoidable, so the real challenge is how to manage it safely rather than pretend it can be eliminated completely. In practical terms, that means building systems of oversight, diversity, and mutual constraint instead of trusting one supposedly perfect model.”

Despite these insights, there are potential misinterpretations and limitations to this study. The mathematical unpredictability of advanced AI means that even a balanced ecosystem of diverse models cannot eliminate all risks. While internal diversity helps prevent one AI from taking over, it does not stop malicious human users from exploiting these systems for harmful ends.

“The first is that this does not mean AI safety is hopeless, and it definitely does not mean we should allow systems to behave however they want,” Zenil said. “It means that perfect, once-and-for-all alignment is too strong an ideal, there is a tradeoff, and that we need more realistic approaches based on management, contestability, and resilience. Another limitation is that our experimental setting is still a simplified model of a much larger problem, so the results should be taken as a proof of principle, not as a finished governance blueprint.”

Future research will likely focus on developing new governance frameworks to balance the rigid safety of proprietary models with the adaptable diversity of open models. Scientists hope to explore ways to gently steer AI ecosystems away from harmful outcomes without imposing impossible levels of central control. Embracing this dynamic diversity tends to offer a more resilient way to integrate artificial intelligence into society.

“My long-term goal is to develop a more rigorous science of cognitive ecosystems, including better ways to measure alignment, disagreement, resilience, influenceability, and coordinated failure in multi-agent systems, and how to resolve conflict,” Zenil said. “I also see strong links to my broader work on causal discovery, Algorithmic Information Dynamics, and the algorithmic future of medicine, because in all these areas the real challenge is not just prediction but understanding and managing complex interacting systems. More broadly, I want to help move AI from correlation-driven optimisation toward causally grounded, interpretable, and governable intelligence.”

The study, “Neurodivergent influenceability in agentic AI as a contingent solution to the AI alignment problem,” was authored by Alberto Hernández-Espinosa, Felipe S Abrahão, Olaf Witkowski, and Hector Zenil.

Fascinating new research suggests artificial neurodivergence could help solve the AI alignment problem

RELATED

Turning to chatbots when lonely may exacerbate feelings of loneliness, study finds

Study explores how virtual “girlfriend experiences” tap evolved relationship motivations in the digital age

High trust in AI leaves individuals vulnerable to “cognitive surrender,” study finds

Artificial intelligence flatters users into bad behavior

How eye contact shapes the believability of computer-generated faces

Women perceive AI as riskier than men do, study finds

Psychologists pinpoint the conversational mechanisms that help humans bond with AI

Unrestricted generative AI harms high school math learning by acting as a crutch

Trending

Psychology of Selling

Welcome Back!

Retrieve your password

Add New Playlist