Large-Scale Pretraining: The Nitty-Gritty Details

Robert Baldock, Aleph Alpha

21.02.2024

2:15 pm - 3:45 pm

LMU Munich, Department of Statistics and via zoom

This talk will give a rare close-up of the nitty-gritty details that go into training large-scale LLMs. In the autumn of 2023, Aleph Alpha Research Lab prepared to train their next generation of large language models, which are training now.

In this talk, Robert Baldock will chronicle their learnings from this process. In particular, he will describe their experiments to optimise the architecture and pretraining, their optimal scaling study, insights about efficient and numerically stable parallel training, tokenizer construction, and the preparation of the large-scale web-crawl dataset.

#lecture

Subscribe to RSS Events feed

Colloquium • 06.05.2026 • LMU Munich, Department of Statistics and via zoom

Data Thinning and Beyond

06.05.26, 4:15-5:45 pm: Daniela Witten from the University of Washington

Lecture • 12.06.2026 • LMU Munich, CAS, Seestr. 13, Munich

Analyzing Feature Interactions Through Local Effects in Machine Learning Models

As part of the CAS Research Focus, Giuseppe Casalicchio talks about interpretable machine learning that develops methods.

Large-Scale Pretraining: The Nitty-Gritty Details

Robert Baldock, Aleph Alpha

Related