Home  | Publications | CBR+24

Leveraging (Sentence) Transformer Models With Contrastive Learning for Identifying Machine-Generated Text

MCML Authors

Abstract

This paper outlines our approach to SemEval-2024 Task 8 (Subtask B), which focuses on discerning machine-generated text from human-written content, while also identifying the text sources, i.e., from which Large Language Model (LLM) the target text is generated. Our detection system is built upon Transformer-based techniques, leveraging various pre-trained language models (PLMs), including sentence transformer models. Additionally, we incorporate Contrastive Learning (CL) into the classifier to improve the detecting capabilities and employ Data Augmentation methods. Ultimately, our system achieves a peak accuracy of 76.96% on the test set of the competition, configured using a sentence transformer model integrated with CL methodology.

inproceedings


SemEval @NAACL 2024

18th International Workshop on Semantic Evaluation at the Annual Conference of the North American Chapter of the Association for Computational Linguistics. Mexico City, Mexico, Jun 16-21, 2024.

Authors

H. Chen • J. Büssing • D. RügamerE. Nie

Links

URL

Research Areas

 A1 | Statistical Foundations & Explainability

 B2 | Natural Language Processing

BibTeXKey: CBR+24

Back to Top