We measure how much one extra recurrence is worth to a looped (depth-recurrent) language model, in equivalent unique parameters. From an iso-depth sweep of 116 pretraining runs across recurrence counts r∈{1,2,4,8} spanning ∼50× in training compute, we fit a joint scaling law L=E+A(Nonce+rφNrec)−α+BD−β and recover a new recurrence-equivalence exponent φ=0.46. Intuitively, φ tells us whether looping a block r times is equivalent in validation loss to r unique blocks of a non-looped model (full equivalence, φ=1) or to a single block run repeatedly with no capacity gain (φ=0). Our φ=0.46 sits in between, so each additional recurrence predictably increases validation loss at matched training compute. For example, at r=4 a 410M looped model performs on par with a 580M non-looped model, but incurs the training cost of a 1B non-looped one. We demonstrate the utility of φ as a measurement tool on two probes. Truncated backpropagation lowers φ to 0.38, indicating that the loop mechanism is poorly trained under truncation, even though validation loss decreases. Conversely, hyperconnections raise φ to 0.65, a genuine capacity gain. Our method applies to any looped LM and separates true loop improvements from token-budget gains.
misc SRK26a
BibTeXKey: SRK26a