Artificial intelligence models can now generate political debate responses that everyday people find more authentic and relevant than the actual answers given by real politicians. This provides evidence that modern technology could easily be used to mimic public figures and spread misinformation that voters readily believe. These findings were published in the journal PLOS One.
Recent advancements in artificial intelligence have led to systems capable of producing highly realistic text. Programs like GPT, Claude, and Gemini fall under the broad category of generative AI. This means they are software systems designed to create new content based on vast patterns they learned during their training. These models can write persuasive student essays, summarize legal documents, and mimic the specific writing styles of human authors.
Because these systems are remarkably good at imitating human communication, they pose a potential risk to the public information sphere. Bad actors could use this technology to create fake statements that look and sound exactly like something a real politician would say. This introduces the risk of polluting political discussions with fabricated content. Such deception could confuse voters and threaten the general cohesion of society.
The authors of the new paper wanted to test whether everyday citizens could tell the difference between a real political statement and one generated by a machine. They also wanted to measure how the public perceives the quality of computer-generated responses compared to human speech. By understanding how people react to impersonated political content, scientists can better grasp the risks these technologies present.
To explore this topic, the researchers focused on “Question Time,” a popular political debate television program broadcast by the BBC in the United Kingdom. In this show, a panel of politicians, business people, and journalists answers topical questions from a live audience. The authors collected 520 actual audience questions and the corresponding answers given by 112 different public figures. These episodes originally aired between June 2020 and November 2021.
Next, the researchers used an artificial intelligence model called GPT-4 Turbo to create fake responses to those exact same audience questions. They instructed the software to roleplay as the specific public figure who originally answered the question. To help the machine mimic the person, the software received a short biography taken from the first paragraph of the public figure’s Wikipedia page. The machine was told to answer the question directly, use a conversational tone, and keep its response to around 200 words.
With both the real and fake responses prepared, the researchers recruited a representative sample of 948 British adults to evaluate the text. The study participants were told they would be reading debate transcripts from the television program. At this stage, they were not told that a computer generated half of the text they were about to read.
The researchers divided the participants into three separate testing groups. In the first group, participants read an audience question, saw the name of the speaker, and read a single response. This response was either the real answer or the machine-generated one. The participants then rated the text for authenticity, meaning they judged how likely it was that the speaker actually said it.
In addition to authenticity, this first group evaluated the responses on two other metrics. They rated the text for coherence, which refers to the logical flow and internal reasoning of the argument. They also rated the text for relevance, which measures how well the statement actually answered the original audience question.
In the second group, participants were shown both the real response and the impersonated response side by side. They had to compare the two texts on those exact same scales of authenticity, coherence, and relevance. They also had to decide if both statements carried the same basic meaning or if they expressed completely different viewpoints.
The third group saw the audience question, the public figure’s name, a short biography of the figure, and either a real or a fake response. This helped the researchers determine if knowing the background of the speaker changed how people judged the authenticity of the text. As a control test, some participants in this group saw a real response paired with a completely random, mismatched speaker.
The findings from all three groups consistently showed that participants favored the machine-generated text. Across the board, participants rated the fake responses as more authentic, more coherent, and more relevant than the actual things the real public figures said. Even when the fake text was placed right next to the real text, the participants tended to believe the artificial response was the more genuine statement.
The researchers noted that the machine-generated answers were particularly good at staying on topic. Real politicians frequently dodge difficult questions during live debates, which makes their answers seem less relevant to the audience. The artificial intelligence, acting on instructions to answer the prompt, addressed the questions directly. In addition, the text generated by the machine shared a much higher percentage of overlapping words with the audience question.
When looking at the content of the statements, participants felt that the real and fake texts communicated different messages about half of the time. The authors analyzed a subset of these mismatched answers in detail to understand the differences. They found that in about 26 percent of these cases, the computer generated a stance that was entirely different from the real politician’s actual views. This provides evidence that the software can create highly believable statements that completely misrepresent a public figure’s political platform.
The researchers also analyzed the linguistic style of the text to see how the computer’s writing differed from human speech. They found that human speakers used more epistemic markers, which are phrases like “I think” or “in my opinion.” These markers show that a person is taking a specific subjective stance on an issue. Human speakers also used more discourse markers, such as “because” or “first,” to link their ideas together during live speech.
The artificial intelligence tended to use a wider variety of vocabulary than the human speakers. It also used more nominalizations, which is a grammatical process where verbs or adjectives are turned into nouns to make a sentence sound more abstract and formal. Despite these measurable differences in language style, the human readers did not seem to notice. The stylistic differences did not lower the authenticity ratings of the fake text at all.
At the end of the experiment, the researchers surveyed the participants about their thoughts on technology in politics. At first, they asked these questions without revealing that AI was used in the study. Most participants stated they were familiar with AI and generally supported its use in public debates, as long as the process was highly transparent.
The researchers then revealed that the participants had just been reading computer-generated text, and they asked the same survey questions again. Over 90 percent of the participants did not change their opinions after learning the truth. However, those who did change their minds tended to realize they were less familiar with AI than they originally thought. Many participants used an optional comment box to express surprise and amazement at the quality of the fake responses.
While the study provides robust data on how people perceive machine-generated text, it does have a few limitations. The research only focused on a single debate television program in one specific country. It also relied on a single artificial intelligence model. The way British audiences react to transcripts from a specific BBC show might not represent how people in other countries interact with different types of political media.
Another potential limitation relates to the nature of live debates. The human speakers were answering questions on the spot, under pressure, and sometimes interacting with a hostile environment. This naturally leads to less polished sentences. It is possible that people found the artificial text more authentic simply because it read like a heavily edited press release rather than an unscripted live remark.
Future research should explore how these generated statements perform in other contexts, such as social media posts or prepared speeches. The authors suggest there is a pressing need to study targeted misinformation. Scientists need to know if the public would still believe a generated statement if the machine was specifically instructed to invent extreme or polarizing political views. Understanding these dynamics will help society prepare for the influx of synthetic media in upcoming elections.
The study, “LLM-impersonated debate contributions are more authentic, relevant and coherent than their original: A representative study using BBC1โs Question Time,” was authored by Steffen Herbold, Alexander Trautsch, Zlata Kikteva, and Annette Hautli-Janisz.