A Scalable and Explainable Approach to Discriminating Between Human and Artificially Generated Text

This project will use existing datasets of paraphrases, abstractive summarization, and dialogue generation to generate parallel datasets of human- and machine-generated texts automatically and at scale. For paraphrases, we will sample examples from the MRPC corpus – a corpus of containing human-generated paraphrase pairs – and feed sampled sentences to natural language generation (NLG) models to create artificial examples. We will adopt a similar procedure for abstractive summarization (relying on the CNN/Daily Mail corpus) and for prompt-based dialogue generation (prompts from the DailyDialog corpus) and story generation (Fan et al, 2018). This will yield a large, highly controlled dataset containing parallel human- and machine-generated examples.

We will subsequently train classifiers to discriminate between human- and machine-generated sentences using interpretable sentence descriptives as inputs. We will use interpretable linguistic and cognitive features extracted through SentSpace (Tuckute et al., 2022) and TextDescriptives (Hansen et al., 2023) as inputs to tree-based models (XGBoost) that discriminate between human vs. machine-generated texts, and inspect their SHAP values to infer feature importance. Thirdly, we will validate previous hypotheses on geometric and information-theoretic properties of machine-generated text, which suggest that model-generated texts are more similar and more repetitive than texts produced by humans. Finally, we will test human performance on the same task in an online experiment, to assess how well humans can perform text discrimination and – by inspecting relations between human performance and parametric variation of feature values – evaluate which text features are relevant for human heuristics.