Home | Publications | LHY+26

Routing-Free Mixture-of-Experts

MCML Authors

Sikuan Yan

→ Group Volker Tresp
Database Systems, Data Mining and AI

Volker Tresp

Prof. Dr.

Core PI

Database Systems, Data Mining and AI

Yunpu Ma

Dr.

→ Group Volker Tresp
Database Systems, Data Mining and AI
→ Co-Group Hinrich Schütze

Abstract

Standard Mixture-of-Experts (MoE) models rely on centralized routing mechanisms that introduce rigid inductive biases. We propose Routing-Free MoE which eliminates any hard-coded centralized designs including external routers, Softmax, TopK and load balancing, instead encapsulating all activation functionalities within individual experts and are directly optimized through continuous gradient flow, enabling each expert to determine its activation entirely on its own. We introduce a unified adaptive load-balancing framework to simultaneously optimize both expert-balancing and token-balancing objectives through a configurable interpolation, allowing flexible and customizable resource allocation. Extensive experiments show that Routing-Free MoE can consistently outperform baselines with better scalability and robustness. We analyze its behavior in detail and offer insights that may facilitate future MoE design and optimization.

misc LHY+26