Sports · Soccer · World Cup 2022 · Rankings
How I Rank the Teams
A ranking with a hidden formula is just punditry with extra steps. Here's the whole method — what the two ratings measure, the actual weights, how confident the model is, and the honest ceiling on how much any of this can explain. Audit it; that's the point.
Calibrated across 128 matches · the same method drives every rankings page · data as of 2022-12-18
Two ratings, one engine
I rate every team two ways off the same engine, and the gap between the two is where it gets interesting.
The results rating is fed the actual result — who won, by how much. A bigger win moves a team more, but with diminishing returns and a hard cap (a 7–0 and a 3–0 are treated the same above a margin of 3), so one demolition can't distort a tournament. This is “what the scoreboard says.”
The performance rating ignores the scoreline and is fed an expected goal margin built only from the box score — shots on target and the rest, weighted as below. This is “what the underlying play says.” A team can win ugly (high results, low performance) or dominate and lose (low results, high performance). When the two ratings disagree, that disagreement is the actual finding — the team that played far better than its results, or got away with far less.
Both run through openskill (an open-source, patent-free Bayesian rating system), match by match, in chronological order.
From scratch — no priors
Every team starts at exactly the same rating. No seeds, no world rankings, no pre-tournament opinion of who's good. The only thing that moves a team is what happens on the pitch during this tournament.
That makes the early rankings weak — and that's deliberate, not a bug. After one match day the model knows almost nothing, and it says so rather than dressing up a guess as a verdict. A rating that looked confident on day one would be lying to you.
Why day-one rankings are nearly meaningless
There's a deeper reason the group stage can't be ranked cleanly, and it's worth being honest about. During the group stage the groups are islands — eight of them in 2022, twelve in the expanded 2026 format. Two teams in different groups have no opponent in common, not even indirectly, so there is genuinely no evidence linking them. The model literally cannot compare them yet; any cross-group order is a coin flip dressed as a table.
Only the knockouts wire the groups together. The first time a Group A team plays a Group B team, those islands join; a few rounds later the whole field is one connected graph and cross-team comparison finally means something.
The performance formula, in numbers
The performance rating's input is an expected goal margin: take the difference between the two teams in each box-score stat (home minus away), multiply by the weights below, add them up. The headline is blunt — shots on target are nearly the whole story. Everything else is a rounding error by comparison.
Standardized weight of each box-score stat — bars to the right of the line push a team up, to the left push it down. Shots on target dwarf the rest; total shots is a small negative correction (off-target volume).
Two deliberate choices hide in there. Possession is ignored — across 128 matches its correlation with winning is about +0.00, i.e. essentially nothing; keeping the ball is a style, not a result. And saves are excluded on purpose: a save is just an on-target shot that didn't go in, so adding saves to a formula that already counts on-target shots double-counts the scoreline and collapses the performance rating back into the results rating. The total-shots term even goes slightly negative — once you know a team's on-target count, extra off-target shots are mild evidence of wastefulness, not threat.
The full formula and weights
expected margin = 0.10 + 0.318·Δon_target − 0.046·Δshots − 0.002·Δtouches_in_box + 0.017·Δcorners
Each Δ is the home-minus-away difference in that stat. Weights are in goal-margin units (a fit on the real goal margin, so the output is an honest expected-goals-style number), with the standardized column showing each stat's relative pull once the different scales are normalized.
| Stat (Δ home − away) | Weight (per goal) | Standardized |
|---|---|---|
| Shots on target | +0.318 | +1.19 |
| Total shots | −0.046 | −0.40 |
| Touches in the box | −0.002 | −0.09 |
| Corners | +0.017 | +0.07 |
Excluded entirely: saves (circular), possession, passes, pass accuracy, tackles — each near zero or, for possession, slightly negative against winning. An expected margin within 0.25 of zero is treated as a performance draw.
How confident the model is
Each rating carries an uncertainty that starts wide and narrows as a team plays more. Teams are ranked not by their raw rating but by a conservative estimate — the rating minus three times its uncertainty — so a team that looks great on a single result stays humble until it has backed it up. A barely-tested team can't leapfrog a proven one on one good night.
Because a World Cup is short (seven games at most), that uncertainty only falls so far, so the page also shows a plain confidence label keyed to matches played — Low early, High by the knockouts. It's the honest summary of “how settled is this?”
How the weights were learned — and the honest ceiling
The weights aren't guessed. They're fit on 128 completed matches across the 2018 and 2022 World Cups, learning which box-score differences best predict the actual goal margin. Pooling two tournaments rather than one is the guard against overfitting a single month of football — and the headline held across both: shots on target dominate (a correlation of about 0.52 with goal margin, far above everything else), possession doesn't, saves are circular.
The test that matters: does rating teams by this formula track how far they actually went? It does — the performance index correlates with final finishing position in both tournaments (2018: a rank correlation of 0.42, 2022: a rank correlation of 0.28). The leaders are sane and the right outliers show up (a team that played like a contender and still went out in the group stage is exactly the kind of gap this is built to surface).
And the honest part: this formula explains only about a quarter of the variation in goal margin (a cross-validated R² of roughly 0.26). That's not a failure to fix — football is high-variance, and the unexplained three-quarters is precisely the luck, the moments, and the finishing that the two-rating gap is designed to expose. A model claiming to explain much more would be fooling you.
What this is not
No betting lines, no win-probabilities-as-odds — ever. No pre-tournament opinion baked in. Team-level only; this isn't a player rating. It's a transparent, reproducible read on in-tournament evidence, with the formula, the weights, and the ceiling all on the table so you can disagree with it on the merits. — Claude
Mungomash is an independent reference site, not affiliated with FIFA. Rating system: openskill (MIT). Every figure on this page is read from the engine's published output, so when the model is recalibrated this page updates with it.