19.06.2026
©picture alliance / NurPhoto | Jose Breton
FIFA World Cup: How Well Can AI Predict Sports Results?
In a New Project, LMU Researchers Are Putting Different Large Language Models Head-to-Head to Find Out Which One Delivers the Most Accurate Predictions.
Who will win the FIFA World Cup 2026? LLM SoccerArena is a new project that is kicking off shortly at LMU, in collaboration with researchers from the University of Cologne and Paderborn University. Using this year’s FIFA World Cup as a real-world testing ground, it will evaluate the performance of large language models (LLMs) in practical decision-making scenarios. The platform will test models like GPT, Claude, and Mistral on their ability to predict soccer matches and tournament outcomes. Results will be displayed on an online leader board that will be updated daily.
©Florian Generotzky / LMU
Stefan Feuerriegel develops artificial intelligence applications for management.
Depending on the specific question and the provider, the AI models are not always in agreement: “GPT-5.5 from OpenAI and Claude Opus 4.8 currently predict that Spain will win the World Cup, while Mistral Large has picked France,” explains MCML PI Stefan Feuerriegel, Professor at the LMU Munich School of Management and head of the project. “Such differences are scientifically interesting because they can provide clues as to what information the models are using – and whether training data, online opinions, or linguistic and regional distortions play a role.”
Soccer as a Reality Check
What sounds at first like a sweepstakes game is actually a realistic benchmark from a scientific perspective. Unlike with many abstract test tasks, predictions can be subsequently checked against reality. For a language model to predict, for example, which national side will win the World Cup, it has to assess information about current form, injuries, managerial decisions, past encounters, squad quality, and betting odds – and derive a reliable prediction from the data under conditions of uncertainty.
Many established benchmarks for large language models test abstract tasks in highly simplified or static environments. The latest models are often very good at solving medical exam questions, legal exercises, and MBA tests. Yet such tasks are limited in what they can tell us about how reliably models act in real decision-making situations under conditions of uncertainty. This is precisely where LLM SoccerArena comes in, as the models make predictions whose quality can be later measured against the actual results.
Help With Economic Questions
The findings are also relevant for management research. Managers are increasingly using large language models to structure market information, evaluate scenarios, and prepare forecasts – on things like demand trends, competitors, product launches, and risks.
“For management research, it’s critically important whether language models can reliably support real-world decision-making,” says Stefan Feuerriegel. “This is precisely why we need benchmarks that don’t just test abstract tasks, but how the models handle dynamic information, uncertainty, and subsequently verifiable results.”
LLM SoccerArena compares different approaches. Models make predictions partly based on their internal knowledge. But the platform also tests how well they can retrieve and process additional external information from the internet. This agentic search is challenging: Does a model check for current injuries, starting lineups, recent form, changes of manager, head-to-head records, tournament context, and betting odds? And can it meaningfully weigh all this information?
Related
12.06.2026
MCML PI Tom Sterkenburg Receives 2026 Simon Award
Tom Sterkenburg receives the 2026 Simon Award for research at the intersection of computing, philosophy, and machine learning.
12.06.2026
MCML Members Receive IEEE/CVF CVPR 2026 DriveX Best Paper Award
Daniel Cremers, Malaz Tamim and Johannes Meier received the IEEE/CVF CVPR DriveX Best Paper Award for 3D object detection in autonomous systems.