Ratings of overall quality of the poems are lower when participants are told the poem is generated by AI than when told the poem is written by a human poet (two-sided Welch’s t(4571.552) = –17.398, p < 0.0001, pBonf < 0.0001, Meandifference = –0.814, Cohen’s d = -0.508, 99.5% CI –0.945 to –0.683), confirming earlier findings that participants are biased against AI authorship2,7,15. However, contrary to earlier work14,16,17 we find that ratings of overall quality are higher for AI-generated poems than they are for human-written poems (two-sided Welch’s t(6618.345) = 27.991, p < 0.0001, pBonf < 0.0001, Meandifference = 1.045, Cohen’s d = 0.671, 99.5% CI 0.941 to 1.150); Fig. 1 compares the ratings distributions for AI-generated poems and human-written poems. The same phenomenon – where ratings are significantly lower when told the poem is AI-generated but are significantly higher when the poem is actually AI-generated – holds for 13 of our 14 qualitative ratings.
You are viewing a single comment's thread from: