
As the capabilities of ChatGPT and other text-based generative AI services grow, so do the questions surrounding them. Can AI capture and recreate human writing patterns? Can we truly tell the difference between human and machine writing? According to research by a team from Temple University's Control and Adaptive Behavior Laboratory (CAB Lab) and published by Scientific Reports, the answers may be changing quickly.
The idea for the study took shape when Jason Chein, Principal Investigator of Temple's CAB Lab and the paper's lead author, noticed an uptick in internet articles and quizzes on the differences between AI-generated and human-written content. "I started to wonder, 'Are there types of people who are fooled and others who are not?'" recalls Chein. "Could we hypothesize about who's better at [detecting AI] than others? Would we be right about this hypothesis?"
The results indicate that we are distinguishing AI from human at a level only marginally better than pure guessing, and while there are traits that give some people an edge over others at detecting AI-generated text, they might not be what you think. For example, you might assume that spending more time browsing on your phone would sharpen your radar for AI. However, one of the study's findings was that higher rates of smartphone and social media usage typically led to a higher likelihood of mistaking AI content for human content, suggesting that increased exposure might be making AI-generated content appear more human to users.
"There's a broader societal implication of all this, which is that we should expect that we're going to be living in a world where we can't tell what comes from people and what comes from machines, and we have to make societal adjustments to that," says Chein.
To execute the study, the team (rounded out by students Steven Martinez and Alex Barone) first selected a set of approximately 50-word social media posts, written by humans and posted by reputable publications. Each was picked from a period before generative AI tools were made widely available. To further ensure that these samples were actually "human," i.e. they had no signature of being AI-generated, they were each run through a set of AI detection tools.
For each human-written sample, an AI-written equivalent was generated. For instance, if a human-written post consisted of a 30-word news headline that mentioned heart disease, the team would prompt ChatGPT to write a 30-word news post about heart disease. From there, the discussion moved to how people actually use ChatGPT outputs. Notes Chein: "Do [GPT users] take the first result, or do they refine it? Do they cherry-pick? There have been a few studies that have looked at this, and to no one's surprise, when you add that human element, AI suddenly gets a lot better at generating things that feel human. We made the decision not to [refine our results] and see what happened."
Before judging the social media post samples, participants completed a set of tasks meant to broadly assess their executive functioning and nonverbal fluid intelligence. They were also given short questionnaires to gauge their smartphone and social media habits, as well as their cognitive and affective empathy. A total of 194 participants were included in the final sample.
Overall, subjects were 57% accurate in identifying the source of the texts, only slightly better than pure chance. Empathy and executive functioning scores had little to no bearing on how participants performed. However, nonverbal fluid intelligence, as measured by a test known as Raven's Standard Progressive Matrices, proved to be strongly associated with the ability to differentiate human and AI texts.
Of course, as generative AIs develop and adapt, the cues that we can identify will continuously shift and shrink. Still, Chein is excited to see how this information helps to direct future research. "Right now, we're seeing that there is a discernible signal, but it's very likely that, as these generative AIs improve, we're going to move closer and closer to just guessing," explains Chein. "The question that we're now working on is whether we can encourage someone to identify the right kind of information, choose the right mental strategy or invest the right kind of cognitive effort into being able to tell the difference."
The study also found that subjects who can't tell the difference between human and AI are more likely to be sharing information online. In our current media landscape, as concerns about the spread of false information continue to mount, this could be alarming. However, Chein is quick to note that the emergence of generative AI and the dissemination of "fake news" are still separate issues.
"It's easy to fall into the trap of thinking that something that comes from an artificial intelligence source is necessarily fictive. People produce fake things and erroneous things, and AI systems, because they borrow from a wealth of prior information, often produce things that are true," says Chein. "But there is a connection point here, which is that generative AIs have made it a lot easier to produce and distribute fake information."