Home | Publications | ZYM+23a

Baby's CoThought: Leveraging Large Language Models for Enhanced Reasoning in Compact Models

MCML Authors

Bolei Ma

→ Group Frauke Kreuter
Social Data Science and AI

David Rügamer

Prof. Dr.

Principal Investigator

Statistics, Data Science and Machine Learning

Ercong Nie

→ Group Hinrich Schütze
Computational Linguistics

Abstract

Large Language Models (LLMs) demonstrate remarkable performance on a variety of natural language understanding (NLU) tasks, primarily due to their in-context learning ability. This ability could be applied to building babylike models, i.e. models at small scales, improving training efficiency. In this paper, we propose a 'CoThought' pipeline, which efficiently trains smaller 'baby' language models (BabyLMs) by leveraging the Chain of Thought prompting of LLMs. Our pipeline restructures a dataset of less than 100M in size using GPT-3.5-turbo, transforming it into task-oriented, human-readable texts that are comparable to the school texts for language learners. The BabyLM is then pretrained on this restructured dataset in a RoBERTa fashion. In evaluations across 4 benchmarks, our BabyLM outperforms the vanilla RoBERTa in 10 linguistic, NLU, and question-answering tasks by more than 3 points, showing a superior ability to extract contextual information. These results suggest that compact LMs pretrained on small, LLM-resabructured data can better understand tasks and achieve improved performance.

inproceedings ZYM+23a

BabyLM Challenge @CoNLL 2023)

BabyLM Challenge at the 27th Conference on Computational Natural Language Learning. Singapore, Dec 06-10, 2023.

Authors

Z. Zhang • H. Yang • B. Ma • D. Rügamer • E. Nie

Links

DOI GitHub

Research Areas

A1 | Statistical Foundations & Explainability

B2 | Natural Language Processing

C4 | Computational Social Sciences

BibTeXKey: ZYM+23a

#p-kreuter #p-ruegamer #p-schuetze