Public’s confidence in its ability to evaluate AI-generated text cause for concern, says COM researcher

AI image
March 25, 2024
Twitter Facebook


Public’s confidence in its ability to evaluate AI-generated text cause for concern, says COM researcher

More Americans are adopting tools such as ChatGPT, Gemini and Claude, but a new opinion survey suggests scoring in their own ability to evaluate the accuracy, reliability, completeness, and biases of the text generated by artificial intelligence is cause for concern, according to a researcher who led the study.

Yi Grace Ji, assistant professor at Boston University’s College of Communication and the primary investigator of the survey, in partnership with Ipsos, said the average result – a mean score of 3.26 out of 5, with a 5 for individuals who strongly agree that they can perform a set of specified tasks in critically evaluating AI-generated responses – is worrisome, especially because respondents tend to overestimate their own abilities.

“In one of my classes, this would barely be a passing grade,” Ji said. “That is a real cause of concern to me, in light of the speedy adoption of AI text generation tools and their capacity for spreading misinformation and undermining critical thinking.”

“Many individuals use these tools as advanced search engines for study or work,” she added, “but the data used to train AI models can be outdated and contain biases, causing the generation of inaccurate, incomplete, and biased information. More and more online information, including political ads, are created using Gen AI tools, and some individuals can be more prone and susceptible to inaccurate and biased information that may appear in those content.”

Survey respondents rated themselves 3.28 out 5 with the statement, “I can evaluate the accuracy of the responses” from AI text. The numbers were similar when the statements measured the reliability, completeness, and identifying errors in responses, but lower in “recognizing and explaining bias” in results.

Not surprisingly, Ji said, one’s ability to evaluate AI text generators was higher among respondents with higher incomes as well as younger adults, city-dwellers, and college-educated.

Survey Summaries:

How much do you agree/disagree with the following statements regarding your experience with AI text generators, such as ChatGPT, Gemini, Claude, etc.?


Strongly Disagree: 1
Disagree: 2
Neither Agree No Disagree: 3
Agree: 4
Strongly Agree: 5

Average Scores for Each Item:

I can evaluate the accuracy of their responses: 3.28
I can evaluate the reliability of their responses: 3.23
I can identify errors in their responses: 3.32
I can evaluate the completeness of their responses: 3.28
I can recognize and explain bias their responses: 3.18

Aggregated Average Score from the Five Items: 3.26

About the Media & Technology Survey:

The Media & Technology Survey is an ongoing project of the Communication Research Center (CRC) at Boston University’s College of Communication, in partnership with Ipsos, the market research company. This month’s poll was conducted in English on March 12 to 13, 2024, using Ipsos eNation Omnibus, a nationally representative online survey that measures attitudes and opinions of 1,004 adults across the United States. This online survey has a credibility interval (CI) of plus or minus 3.5 percentage points. The data were weighted to the U.S. population data by region, gender, age and education. Statistical margins of error are not applicable to online polls. All sample surveys and polls may be subject to other sources of error, including, but not limited to coverage error and measurement error