A groundbreaking study reveals that artificial intelligence, particularly ChatGPT4, can surpass the average human’s ability to generate ideas in a classic creativity assessment. But while AI chatbots displayed consistently high performance, they did not outperform the most creative human participants. Instead, humans showed a wider range of creative potential, which could be linked to variations in executive functions and cognitive processes.
These new findings have been published in the journal Scientific Reports.
Traditionally, creativity was thought to be a distinctly human trait, driven by complex cognitive processes such as imagination, insight, and the ability to connect seemingly unrelated concepts. Yet, as AI technology continues to advance, it has become increasingly clear that machines are capable of producing creative outputs that rival, and sometimes even surpass, human achievements.
“I think we are living in a very particular historical moment in which how we perceive machines and machine intelligence may radically change. I believe as a scientist that there is a lot of research to be done on how people perceive machines and what are the abilities that machines can currently imitate from humans,” explained study author Simone Grassini, an associate professor at the University of Bergen.
“Until few decades ago, it would have been difficult to believe that machines could output abilities such as as creative behavior, and the field is developing so quickly that is difficult to predict what will happen in one or two years from now.”
The researchers conducted their study using a common creativity test known as the Alternate Uses Task (AUT). In this task, participants, both human and AI chatbots, were asked to generate unique and creative uses for common objects such as a rope, box, pencil, and candle.
The humans had 30 seconds to generate as many creative ideas as possible. Chatbots, on the other hand, were instructed to generate a specific number of ideas (e.g., 3 ideas) and use only 1-3 words in each response. Each chatbot was tested 11 times.
The study involved three AI chatbots: ChatGPT3, ChatGPT4, and Copy.Ai, along with a group of 256 human participants. The participants, all native English speakers, were recruited from the online platform Prolific and had an average age of 30.4 years, with a range from 19 to 40 years.
The responses from both humans and AI chatbots were analyzed using two main approaches:
- Semantic Distance Scores: This automated method assessed the originality of responses by measuring how different they were from common or expected uses of the objects.
- Subjective Ratings of Creativity: Six human raters, unaware of which responses were generated by AI, were asked to evaluate the creativity of the ideas on a 5-point scale.
The researchers found that the AI chatbots, particularly ChatGPT3 and ChatGPT4, consistently achieved higher semantic distance scores than humans. This means that they generated responses that were more original and less conventional compared to human participants. Human raters also rated AI chatbots, especially ChatGPT4, as more creative than human participants on average.
“According to our results, AI chatbots (such as ChatGPT) are getting quite good in outputting creative answers when asked to perform a traditional creative thinking tasks often used in psychological research,” Grassini said.
However, it’s important to note that while AI chatbots performed exceptionally well, they did not consistently outperform the best human performers. In some cases, highly creative individuals among the human participants were able to compete with AI in generating novel and imaginative responses.
“The average machine performs the Alternate Uses Tasks better than the average human. However, the best of the human participants still outperformed all the models we tested,” Grassini told PsyPost.
“This is a remarkable achievement by the AI systems. However, I believe that people should not overestimate what this may mean in the real world. The fact that a machine can perform quite well in a very specific creativity task, does not mean that the machine will perform well in complex jobs that include creativity. Whether or not these ‘skills’ of the machines are transferable to anything in the real world is still to be tested.”
“I prefer to believe that in the future AI as chatbots will help humans in their creative jobs, more than substitute them for these jobs,” Grassini said. “I think we should need to start thinking about a future in which humans and AI machines can co-exist without necessarily thinking that the machines will destroy us or will steal all our jobs.”
“However, it is worth noting that the impact of AI in the job market is quite significant, and most likely will increase in the next years. How our society will develop to integrate AI into human jobs is therefore a crucial issue of modernity, and I expect that governments and stakeholders will elaborate guidelines and rules on how machines can be used to replace or aid human work.”
ChatGPT4 was the most creative among the chatbots when subjective ratings were considered.
“A notable findings was that ChatGPT4 (the newest model tested) did not outperform the other AI models when the task was scored using an algorithm to measure semantic distance,” Grassini explained. “However, ChatGPT4 performed generally better than the other models when human raters would score the level of creativity displayed in the responses.”
“This seems to point out that the outputs of ChatGPT4 did not differ compared to the others in the ‘objective’ semantic distance between the proposed item and the creative way to use it. However, ChatGPT4 answers are more ‘appealing’ (i.e., are perceived subjectively more creative) by the human raters.”
Like any scientific study, this research has its limitations. “We only measure one type of creative behavior,” Grassini told PsyPost. “Our results may not be generalizable for creativity as a complex phenomenon.”
The researchers also noted that comparing creativity at process levels between humans and chatbots is challenging, as chatbots are essentially “black boxes” whose internal processes are not fully understood.
“The machine may not ‘display’ creativity in the factual way, but it may have learnt the best answer to that specific task in the training data,” Grassini explained. The task may have assessed the chatbots memory rather than its “ability to come up with creative uses of things. Due to the architecture of these models, it is impossible to know.”
The study, “Best humans still outperform artificial intelligence in a creative divergent thinking task“, was authored by Mika Koivisto and Simone Grassini.