Layer 4 — Intelligence

World Cup 2026 Betting Math: Tournament Structure, Group Stage, and Knockout Models

Q: "How does the 2026 World Cup 48-team format work mathematically?"

"The 2026 World Cup has 12 groups of 4 teams. Each group plays 3 matches (round-robin). The top 2 teams per group advance automatically (24 teams), plus the 8 best third-place teams across all 12 groups, yielding a 32-team knockout bracket. Best third-place ranking uses points, then goal difference, then goals scored as tiebreakers."

Q: "How do you simulate World Cup match outcomes with a Poisson model?"

"Estimate each team's expected goals (lambda) from Elo rating differentials: lambda_A = base_rate * 10^((Elo_A - Elo_B) / 800). Then P(team_A scores k goals) = Poisson(k, lambda_A). The joint probability of any scoreline is P(A=i) * P(B=j). Aggregate over all scorelines to get win/draw/loss probabilities for each match."

Q: "How does home advantage work when three countries host the World Cup?"

"Historical World Cup data shows host nations receive roughly +100 Elo points equivalent in match prediction models. For the 2026 World Cup co-hosted by USA, Canada, and Mexico, the host advantage applies when a host nation plays at a venue in their own country. The model assigns +100 Elo to the host in home venues and +40 Elo for matches at co-host venues due to reduced travel and familiar conditions."

Q: "How do you price World Cup outright winner futures from a model?"

"Run 100,000+ Monte Carlo simulations of the full tournament. Count how many times each team wins the final. The fair probability is wins / total_simulations. Fair decimal odds = 1 / probability. Compare against sportsbook futures odds to find positive expected value. For example, if your model gives Brazil a 12.3% chance and BetOnline prices them at +1200 (7.7% implied), the edge is 4.6 percentage points."

Q: "Why is international soccer harder to model than club soccer?"

"National teams play 10-15 competitive matches per year versus 38+ league matches for club teams. This means parameter estimates are noisy — a team's Elo rating has wide confidence intervals. The solution is Bayesian shrinkage: weight recent international results against club-level priors (average Elo of players' club teams). Squad turnover between tournaments also makes historical data partially obsolete."

By Rahim March 21, 2026March 31, 2026 · 22 min read

Mathematical modeling for the expanded 48-team World Cup 2026 — Poisson group stage simulation, Elo-based match probabilities, knockout bracket propagation via Monte Carlo, and futures pricing for autonomous betting agents.

World Cup 2026 Betting Math: Tournament Structure, Group Stage, and Knockout Models

Summary: Technical guide to mathematical modeling for the expanded 48-team FIFA World Cup 2026 (USA, Canada, Mexico). Covers the new tournament structure: 12 groups of 4 teams, top 2 per group advance (24 teams) plus 8 best third-place teams, yielding a 32-team knockout bracket. Models match outcomes using Poisson distributions parameterized by Elo-derived expected goals: lambda_home = attack_strength_A * defense_weakness_B * tournament_avg * host_advantage, where attack and defense strengths derive from FIFA/Elo rating differentials. Derives group stage qualification probability via exhaustive enumeration of all 3^3 = 27 possible match outcome combinations per group (win/draw/loss for each of 3 matches), computing points, goal difference, and tiebreakers. Introduces Monte Carlo simulation for full tournament path modeling — simulates 100,000+ tournament draws, propagates win probabilities through the 32-team knockout bracket, and aggregates to produce outright winner probabilities and round-of-advancement distributions. Host nation advantage modeled as +100 Elo points for matches in home country (USA, Canada, or Mexico) based on historical World Cup home advantage data. Covers best-third-place qualification: ranks third-place teams across all 12 groups by points then goal difference, selects top 8. Addresses the limited-data problem in international soccer — national teams play 10-15 competitive matches per year versus 38+ for club teams — and uses Bayesian shrinkage toward club-level Elo priors. Futures market pricing derived from Monte Carlo advancement probabilities: fair odds for outright winner = 1 / P(win_tournament), group winner = 1 / P(finish_first_in_group). Covers live tournament Bayesian updating: after each group stage match, posterior Elo ratings update and knockout stage probabilities recalculate. Arbitrage detection between outright futures and match-level markets during the tournament. Python implementation uses numpy for simulation, scipy.stats.poisson for match modeling, and pandas for group table management. Maps to Layer 4 (Intelligence) of the Agent Betting Stack. References the AgentBets Poisson Distribution guide for match-level scoring models, the Expected Goals (xG) guide for shot-level international soccer modeling, and the Elo Ratings guide for the rating system foundation. Part of the AgentBets Math Behind Betting series. Topics: World Cup 2026, tournament simulation, 48-team format, group stage modeling, knockout bracket propagation, Monte Carlo simulation, Poisson match model, Elo ratings, host advantage, Bayesian updating, futures pricing, international soccer, best third place, arbitrage detection, autonomous betting agents, Layer 4 Intelligence.

Topics: World Cup 2026, tournament simulation, 48-team format, group stage modeling, knockout bracket, Monte Carlo simulation, Poisson match model, Elo ratings, host advantage, Bayesian updating, futures pricing, international soccer

Stack layers: Layer 4 — Intelligence

Related tools: The Odds API, Polyseer, numpy, scipy, pandas

World Cup 2026 betting math turns a sprawling 48-team tournament into odds, probabilities, and fair prices you can compare to the market. This guide builds that stack with Elo-informed Poisson match models, explicit group-stage calculations, and Monte Carlo simulation through the knockout bracket. The result is a practical way to estimate advancement, outright, and match-level edges instead of guessing from headlines.

Why This Matters for Agents

The 2026 World Cup is the largest single betting event in sports history. Forty-eight teams across three host nations, 104 matches over 39 days, and futures markets that open years in advance with wide spreads and inefficient pricing. For an autonomous betting agent, this is a target-rich environment.

This is Layer 4 — Intelligence. A World Cup tournament model sits at the top of an agent’s prediction stack. It consumes Elo ratings and Poisson match models as inputs, runs Monte Carlo simulations to propagate uncertainty through the bracket, and outputs probabilities for every market type: outright winner, group winner, top goalscorer, round-of-advancement, and match-level bets. Those probabilities feed into expected value calculations and Kelly sizing at the decision layer. The agent then routes bets to the best-priced book — checking sharp offshore sportsbooks like BetOnline and BookMaker alongside prediction markets via the Agent Betting Stack.

The Math

Tournament Structure: 48 Teams, 12 Groups, 32-Team Knockout

The 2026 format is the first expanded World Cup. The structure:

Group Stage:  12 groups × 4 teams = 48 teams
              3 matches per team (full round-robin within group)
              Top 2 per group advance = 24 teams
              Best 8 third-place teams advance = 8 teams
              Total advancing: 32 teams

Knockout:     Round of 32 → Round of 16 → Quarterfinals →
              Semifinals → Third-place match → Final
              All knockout matches decided on the day
              (extra time + penalties if drawn after 90 min)

This structure changes the math compared to the old 32-team format. With 4 teams per group instead of the old format’s 4, the group stage still has 3 matches per team. The critical difference: third-place teams can qualify, which increases the probability of any given team advancing from the group stage.

Elo-Based Match Probability Model

The foundation of the tournament model is a per-match probability generator. We use Elo ratings converted to expected goals via the Poisson distribution.

Step 1: Convert Elo differential to win expectancy.

E_A = 1 / (1 + 10^((Elo_B - Elo_A) / 400))

E_A is the probability team A beats team B in a head-to-head match (ignoring draws). But soccer has draws — roughly 25% of World Cup group stage matches end level. We need a three-outcome model.

Step 2: Convert win expectancy to expected goals (lambda).

The key insight: parameterize each team’s scoring rate as a function of the Elo differential, then use Poisson to generate three-outcome probabilities (win/draw/loss) from those scoring rates.

lambda_A = base_rate × 10^((Elo_A - Elo_B) / 800)
lambda_B = base_rate × 10^((Elo_B - Elo_A) / 800)

Where base_rate is the tournament average goals per team per match. Historical World Cup average: ~1.30 goals per team per match (2.60 total per match). The divisor 800 (instead of 400) maps the Elo scale to scoring rates — a 200-point Elo advantage roughly doubles your expected goals relative to your opponent.

Step 3: Apply host advantage.

If team_A is host playing at home venue: Elo_A += 100
If team_A is host playing at co-host venue: Elo_A += 40

Historical World Cup data: host nations win ~60% of group stage matches versus the ~45% baseline for similarly-rated teams. The +100 Elo adjustment calibrates to this observed advantage. For the 2026 tri-host format, a US team playing in Dallas gets the full +100, but playing in Toronto gets a reduced +40 (familiar hemisphere, reduced travel, friendly crowd, but not home soil).

Group Stage: Exhaustive Outcome Enumeration

Each group has 3 matches (A vs B, A vs C, B vs C — not 6, because 4 teams play a round-robin of C(4,2) = 6 matches). Wait — 4 teams in round-robin produces 6 matches. Each team plays 3 matches. Each match has 3 possible outcomes (home win, draw, away win). That’s 3^6 = 729 possible outcome combinations per group.

For computational efficiency, we simulate each match’s scoreline using Poisson draws rather than enumerating all 729 categorical outcomes. This naturally handles goal difference tiebreakers.

For each group, simulate N scorelines per match → compute points, goal difference, goals scored → rank teams → determine which advance.

Points system:

Win: 3 points
Draw: 1 point
Loss: 0 points

Tiebreaker order:

Points
Goal difference
Goals scored
Head-to-head points
Head-to-head goal difference
Fair play (yellow/red cards) — modeled as random coin flip
Drawing of lots — random

Best Third-Place Qualification

After all 12 groups complete, the 12 third-place teams are ranked:

Points
Goal difference
Goals scored

The top 8 advance to the knockout stage. This cross-group comparison is critical: a third-place team from a weak group (scoring 4 points with +2 GD) likely advances, while a third-place team from a tough group (scoring 3 points with -1 GD) might not. The model must simulate all 12 groups jointly to capture this interaction.

Knockout Bracket Propagation

Once the 32 advancing teams are placed in the bracket, knockout matches are decided by single-game Poisson simulation. If the 90-minute simulation produces a draw, the model runs an extra-time simulation (with reduced scoring rates — lambda_extra = lambda_90 × 30/90 × 0.85, reflecting fatigue and conservative tactics), then a penalty shootout coin flip (historically ~50/50 after adjusting for home advantage and squad quality).

The bracket structure is predetermined by FIFA:

Round of 32 matchups (simplified):
1A vs 3C/D/E   |   1B vs 3A/F/G   |   ... (etc.)
2A vs 2C       |   2B vs 2D       |   ...

(Exact bracket placement depends on which third-place teams qualify)

The path-to-final matters. A team drawn into a bracket half with weaker opponents has higher advancement probability even with identical Elo. The model captures this by simulating the full bracket, not just individual matches.

Monte Carlo Full Tournament Simulation

The complete algorithm:

For each simulation i in 1..N:
    1. For each of the 12 groups:
       - Simulate 6 matches using Poisson(lambda_A), Poisson(lambda_B)
       - Compute standings (points, GD, GS)
       - Record 1st, 2nd, 3rd place
    2. Rank all 12 third-place teams
       - Select top 8
    3. Place 32 teams into knockout bracket per FIFA rules
    4. Simulate each knockout match:
       - 90-min Poisson simulation
       - If draw: extra time Poisson (reduced lambda)
       - If still draw: penalty shootout (coin flip with slight adjustment)
    5. Record tournament winner, runner-up, semifinalists, etc.

Aggregate over N simulations:
    P(team wins tournament) = count(team won final) / N
    P(team reaches QF) = count(team in QF or beyond) / N
    ...etc for all advancement rounds

With N = 100,000, the standard error on a 10% probability estimate is sqrt(0.1 × 0.9 / 100000) = 0.095%, which is precise enough for betting decisions.

Worked Examples

Example 1: Group Stage — USA vs. England

Pre-tournament Elo ratings (approximate, March 2026):

USA:     1780  (hosts, playing in Atlanta)
England: 1950

Host advantage: +100 for USA → Effective Elo: 1880
Elo differential: 1880 - 1950 = -70

lambda_USA = 1.30 × 10^(-70/800) = 1.30 × 10^(-0.0875) = 1.30 × 0.817 = 1.062
lambda_ENG = 1.30 × 10^(70/800)  = 1.30 × 10^(0.0875)  = 1.30 × 1.224 = 1.591

Match outcome probabilities (from Poisson scoreline matrix, summing over all i,j):

P(USA win)  = 0.274  (27.4%)
P(Draw)     = 0.253  (25.3%)
P(ENG win)  = 0.473  (47.3%)

If BetOnline prices this at USA +190 / Draw +240 / England +110:

Implied probabilities (after removing ~5% vig):
  USA:  31.1% (adjusted from 34.5% raw)
  Draw: 27.4% (adjusted from 29.4% raw)
  ENG:  41.5% (adjusted from 47.6% raw)

Model vs. market:
  USA:  27.4% model vs 31.1% market → no edge on USA
  Draw: 25.3% model vs 27.4% market → no edge on draw
  ENG:  47.3% model vs 41.5% market → +5.8pp edge on England

England at +110 (implied ~47.6% raw, ~41.5% after vig removal) against a model probability of 47.3% shows 5.8 percentage points of edge — a bet worth sizing with Kelly.

Example 2: Outright Winner Futures

After 100,000 Monte Carlo simulations with March 2026 Elo ratings:

Team            Elo    P(Win)   Fair Odds   BetOnline Odds   Edge
Brazil         2050    12.3%    +713        +800             +1.0pp
France         2030    11.1%    +801        +700             -1.8pp
England        1950     8.4%    +1090       +900             -2.0pp
Argentina      2000     9.8%    +920        +750             -2.3pp
Spain          1980     7.9%    +1166       +1000            -1.1pp
Germany        1920     5.6%    +1686       +1400            -0.8pp
USA (host)     1780     4.2%    +2281       +1600            -1.1pp
Portugal       1910     5.1%    +1861       +1800            +0.2pp
Netherlands    1890     4.3%    +2226       +2500            +1.1pp

The model identifies Brazil at +800 and Netherlands at +2500 as the two most attractive outright bets. Brazil offers edge because the market slightly underestimates their Elo-derived tournament path probability. Netherlands offers edge because a favorable draw path inflates their advancement probability beyond what their Elo alone suggests.

Example 3: Best Third-Place Cutoff

In 100,000 simulations, the 8th-best third-place team’s typical profile:

Median cutoff:  4 points, +0 goal difference
75th percentile: 4 points, +1 GD
25th percentile: 3 points, +1 GD

P(3 points, 0 GD advances as 3rd) = 38.2%
P(4 points advances as 3rd)        = 94.7%
P(3 points, -1 GD advances as 3rd) = 12.1%

This means a team needing a draw in their final group match to reach 4 points should be heavily favored to advance. The model prices “team X to qualify from group” markets by summing P(finish 1st) + P(finish 2nd) + P(finish 3rd) × P(3rd-place rank in top 8 | their points and GD).

Implementation

import numpy as np
from scipy.stats import poisson
from dataclasses import dataclass, field
import pandas as pd
from typing import Optional


@dataclass
class Team:
    """Represents a national team with Elo rating and host status."""
    name: str
    elo: float
    is_host: bool = False
    host_country: str = ""  # "USA", "CAN", "MEX"


@dataclass
class MatchResult:
    """Result of a simulated match."""
    team_a: str
    team_b: str
    goals_a: int
    goals_b: int

    @property
    def winner(self) -> Optional[str]:
        if self.goals_a > self.goals_b:
            return self.team_a
        elif self.goals_b > self.goals_a:
            return self.team_b
        return None


def elo_to_lambda(
    elo_a: float,
    elo_b: float,
    base_rate: float = 1.30,
    elo_scale: float = 800
) -> tuple[float, float]:
    """
    Convert Elo ratings to Poisson lambda parameters for each team.

    Args:
        elo_a: Team A's effective Elo rating (including host bonus)
        elo_b: Team B's effective Elo rating
        base_rate: Tournament average goals per team per match (World Cup ~1.30)
        elo_scale: Divisor mapping Elo to scoring rate (800 calibrated to World Cup data)

    Returns:
        (lambda_a, lambda_b): Expected goals per match for each team
    """
    diff = elo_a - elo_b
    lambda_a = base_rate * 10 ** (diff / elo_scale)
    lambda_b = base_rate * 10 ** (-diff / elo_scale)
    return lambda_a, lambda_b


def simulate_match(
    team_a: Team,
    team_b: Team,
    venue_country: str = "",
    base_rate: float = 1.30,
    rng: np.random.Generator = None
) -> MatchResult:
    """
    Simulate a single match using Poisson goal scoring.

    Applies host advantage: +100 Elo for home venue, +40 for co-host venue.
    """
    if rng is None:
        rng = np.random.default_rng()

    elo_a = team_a.elo
    elo_b = team_b.elo

    # Host advantage
    if team_a.is_host:
        if team_a.host_country == venue_country:
            elo_a += 100  # Full home advantage
        elif venue_country in ("USA", "CAN", "MEX"):
            elo_a += 40   # Co-host advantage
    if team_b.is_host:
        if team_b.host_country == venue_country:
            elo_b += 100
        elif venue_country in ("USA", "CAN", "MEX"):
            elo_b += 40

    lambda_a, lambda_b = elo_to_lambda(elo_a, elo_b, base_rate)
    goals_a = rng.poisson(lambda_a)
    goals_b = rng.poisson(lambda_b)

    return MatchResult(team_a.name, team_b.name, goals_a, goals_b)


def simulate_knockout_match(
    team_a: Team,
    team_b: Team,
    venue_country: str = "",
    rng: np.random.Generator = None
) -> str:
    """
    Simulate a knockout match with extra time and penalties.
    Returns the name of the winning team.
    """
    if rng is None:
        rng = np.random.default_rng()

    result = simulate_match(team_a, team_b, venue_country, rng=rng)

    if result.winner is not None:
        return result.winner

    # Extra time: 30 minutes with reduced scoring rate
    # Fatigue + conservative tactics reduce lambda by ~15% and scale to 30/90
    elo_a_eff = team_a.elo + (100 if team_a.is_host and team_a.host_country == venue_country else 0)
    elo_b_eff = team_b.elo + (100 if team_b.is_host and team_b.host_country == venue_country else 0)
    lam_a, lam_b = elo_to_lambda(elo_a_eff, elo_b_eff)
    et_factor = (30 / 90) * 0.85
    et_goals_a = rng.poisson(lam_a * et_factor)
    et_goals_b = rng.poisson(lam_b * et_factor)

    if et_goals_a > et_goals_b:
        return team_a.name
    elif et_goals_b > et_goals_a:
        return team_b.name

    # Penalty shootout: model as ~50/50 with slight home advantage
    home_boost = 0.03 if (team_a.is_host and team_a.host_country == venue_country) else 0.0
    if rng.random() < 0.5 + home_boost:
        return team_a.name
    return team_b.name


def simulate_group(
    teams: list[Team],
    venue_country: str = "",
    rng: np.random.Generator = None
) -> pd.DataFrame:
    """
    Simulate a full round-robin group stage for 4 teams.
    Returns a DataFrame with standings sorted by points, GD, GS.
    """
    if rng is None:
        rng = np.random.default_rng()

    stats = {t.name: {"points": 0, "gf": 0, "ga": 0, "team": t} for t in teams}

    # Generate all 6 pairings
    matches = [(i, j) for i in range(4) for j in range(i + 1, 4)]

    for i, j in matches:
        result = simulate_match(teams[i], teams[j], venue_country, rng=rng)
        stats[result.team_a]["gf"] += result.goals_a
        stats[result.team_a]["ga"] += result.goals_b
        stats[result.team_b]["gf"] += result.goals_b
        stats[result.team_b]["ga"] += result.goals_a

        if result.goals_a > result.goals_b:
            stats[result.team_a]["points"] += 3
        elif result.goals_b > result.goals_a:
            stats[result.team_b]["points"] += 3
        else:
            stats[result.team_a]["points"] += 1
            stats[result.team_b]["points"] += 1

    rows = []
    for name, s in stats.items():
        rows.append({
            "team": name,
            "points": s["points"],
            "gd": s["gf"] - s["ga"],
            "gf": s["gf"],
            "team_obj": s["team"]
        })

    df = pd.DataFrame(rows)
    df = df.sort_values(
        by=["points", "gd", "gf"],
        ascending=[False, False, False]
    ).reset_index(drop=True)
    df["rank"] = df.index + 1
    return df


def select_best_third_place(
    third_place_teams: list[dict],
    n_advance: int = 8
) -> list[dict]:
    """
    Rank third-place teams across all 12 groups and select top 8.

    Args:
        third_place_teams: List of dicts with 'team', 'points', 'gd', 'gf', 'team_obj'
        n_advance: Number of third-place teams that advance (8 for 2026 format)

    Returns:
        List of the n_advance best third-place teams
    """
    sorted_teams = sorted(
        third_place_teams,
        key=lambda x: (x["points"], x["gd"], x["gf"]),
        reverse=True
    )
    return sorted_teams[:n_advance]


def simulate_tournament(
    groups: dict[str, list[Team]],
    venue_map: dict[str, str] = None,
    rng: np.random.Generator = None
) -> dict:
    """
    Simulate the entire 2026 World Cup tournament.

    Args:
        groups: Dict mapping group letter to list of 4 Team objects
        venue_map: Optional dict mapping group letter to venue country
        rng: NumPy random generator for reproducibility

    Returns:
        Dict with 'winner', 'runner_up', 'semifinalists', 'advancement' (team -> round reached)
    """
    if rng is None:
        rng = np.random.default_rng()
    if venue_map is None:
        venue_map = {g: "USA" for g in groups}

    # Phase 1: Group stage
    group_results = {}
    third_place_teams = []
    advancing = []

    for group_letter, team_list in groups.items():
        standings = simulate_group(
            team_list,
            venue_country=venue_map.get(group_letter, "USA"),
            rng=rng
        )
        group_results[group_letter] = standings

        # Top 2 advance
        for idx in range(2):
            row = standings.iloc[idx]
            advancing.append({
                "team": row["team"],
                "team_obj": row["team_obj"],
                "group": group_letter,
                "group_rank": idx + 1
            })

        # Record third-place team
        third_row = standings.iloc[2]
        third_place_teams.append({
            "team": third_row["team"],
            "team_obj": third_row["team_obj"],
            "group": group_letter,
            "points": third_row["points"],
            "gd": third_row["gd"],
            "gf": third_row["gf"]
        })

    # Best 8 third-place teams advance
    best_thirds = select_best_third_place(third_place_teams, 8)
    for t in best_thirds:
        advancing.append({
            "team": t["team"],
            "team_obj": t["team_obj"],
            "group": t["group"],
            "group_rank": 3
        })

    # Phase 2: Knockout stage (simplified bracket seeding)
    # Build lookup for advancing teams
    team_lookup = {t["team"]: t["team_obj"] for t in advancing}
    team_names = [t["team"] for t in advancing]

    # Simplified bracket: seed by group rank then shuffle within seeds
    rng.shuffle(team_names)
    bracket = list(team_names[:32])  # Should be exactly 32

    # Simulate knockout rounds
    advancement = {name: "R32" for name in bracket}
    round_names = ["R32", "R16", "QF", "SF", "F"]

    current_round = bracket
    for round_idx, round_name in enumerate(round_names):
        next_round = []
        for i in range(0, len(current_round), 2):
            if i + 1 >= len(current_round):
                next_round.append(current_round[i])
                continue
            team_a_obj = team_lookup[current_round[i]]
            team_b_obj = team_lookup[current_round[i + 1]]
            winner = simulate_knockout_match(team_a_obj, team_b_obj, "USA", rng=rng)
            next_round.append(winner)
            if round_idx + 1 < len(round_names):
                advancement[winner] = round_names[round_idx + 1]

        current_round = next_round
        if len(current_round) == 1:
            break

    winner = current_round[0]
    advancement[winner] = "WINNER"

    return {
        "winner": winner,
        "advancement": advancement,
        "group_results": group_results
    }


def run_monte_carlo(
    groups: dict[str, list[Team]],
    n_simulations: int = 100_000,
    seed: int = 42
) -> pd.DataFrame:
    """
    Run N Monte Carlo simulations of the full World Cup tournament.
    Returns a DataFrame with advancement probabilities for each team.

    Args:
        groups: Dict mapping group letter to list of 4 Team objects
        n_simulations: Number of tournament simulations (100k recommended)
        seed: Random seed for reproducibility
    """
    rng = np.random.default_rng(seed)

    all_teams = set()
    for team_list in groups.values():
        for team in team_list:
            all_teams.add(team.name)

    round_counts = {
        team: {"group_exit": 0, "R32": 0, "R16": 0, "QF": 0, "SF": 0, "F": 0, "WINNER": 0}
        for team in all_teams
    }

    for _ in range(n_simulations):
        result = simulate_tournament(groups, rng=rng)
        advancing_teams = set(result["advancement"].keys())

        for team in all_teams:
            if team in result["advancement"]:
                round_reached = result["advancement"][team]
                round_counts[team][round_reached] += 1
            else:
                round_counts[team]["group_exit"] += 1

    # Convert to probabilities
    rows = []
    for team, counts in round_counts.items():
        row = {"team": team}
        for round_name, count in counts.items():
            row[f"P({round_name})"] = count / n_simulations
        rows.append(row)

    df = pd.DataFrame(rows)
    df = df.sort_values("P(WINNER)", ascending=False).reset_index(drop=True)
    return df


def match_probabilities(
    team_a: Team,
    team_b: Team,
    venue_country: str = "",
    max_goals: int = 8
) -> dict[str, float]:
    """
    Compute exact win/draw/loss probabilities for a match using Poisson.
    No simulation — this is the analytical solution.

    Args:
        team_a: First team
        team_b: Second team
        venue_country: Country where match is played (for host advantage)
        max_goals: Maximum goals to consider per team (8 is sufficient)

    Returns:
        Dict with 'win_a', 'draw', 'win_b' probabilities
    """
    elo_a = team_a.elo
    elo_b = team_b.elo

    if team_a.is_host and team_a.host_country == venue_country:
        elo_a += 100
    elif team_a.is_host and venue_country in ("USA", "CAN", "MEX"):
        elo_a += 40

    if team_b.is_host and team_b.host_country == venue_country:
        elo_b += 100
    elif team_b.is_host and venue_country in ("USA", "CAN", "MEX"):
        elo_b += 40

    lam_a, lam_b = elo_to_lambda(elo_a, elo_b)

    win_a = 0.0
    draw = 0.0
    win_b = 0.0

    for i in range(max_goals + 1):
        for j in range(max_goals + 1):
            p = poisson.pmf(i, lam_a) * poisson.pmf(j, lam_b)
            if i > j:
                win_a += p
            elif i == j:
                draw += p
            else:
                win_b += p

    return {"win_a": win_a, "draw": draw, "win_b": win_b}


# --- Example usage ---
if __name__ == "__main__":
    # Define a sample group
    group_a = [
        Team("USA", 1780, is_host=True, host_country="USA"),
        Team("England", 1950),
        Team("Senegal", 1620),
        Team("Chile", 1640),
    ]

    # Analytical match probabilities
    probs = match_probabilities(group_a[0], group_a[1], venue_country="USA")
    print(f"USA vs England (in USA):")
    print(f"  USA win:  {probs['win_a']:.1%}")
    print(f"  Draw:     {probs['draw']:.1%}")
    print(f"  ENG win:  {probs['win_b']:.1%}")
    print()

    # Simulate group stage
    rng = np.random.default_rng(42)
    standings = simulate_group(group_a, venue_country="USA", rng=rng)
    print("Group A standings (1 simulation):")
    print(standings[["team", "points", "gd", "gf"]].to_string(index=False))

Limitations and Edge Cases

1. Limited international data. National teams play 10-15 competitive matches per year. A team’s Elo rating has wide confidence intervals — England’s “true” Elo might be anywhere in a 100-point range. The model treats Elo as a point estimate, which overstates confidence. The fix: use Glicko-2 with rating deviation to quantify uncertainty, and apply Bayesian shrinkage toward club-level priors (average the Elo of each player’s club team, weighted by minutes played).

2. Squad turnover. The 2022 World Cup Brazil team is not the 2026 World Cup Brazil team. Key retirements (e.g., aging stars who won’t make the squad) and breakout players (club performers earning first caps) change a team’s true strength in ways Elo doesn’t capture until matches are played. An agent should manually adjust Elo estimates based on squad announcements, applying a discount factor to historical Elo proportional to roster turnover.

3. The xG data sparsity problem. Shot-level expected goals models trained on club data don’t transfer cleanly to international soccer. Tactical systems differ, player combinations are less rehearsed, and the sample size per team is tiny. An international xG model needs heavy regularization and Bayesian priors from club-level data. For the World Cup, treat xG as a supporting signal to Elo, not a replacement.

4. Draw path dependency. The model assumes random bracket assignment for knockout rounds, but FIFA’s actual bracket is predetermined by group placement. A team finishing first in Group A faces a specific set of possible opponents, not a random draw. The full model must implement FIFA’s exact bracket structure — simplified random seeding underestimates path difficulty for some teams and overestimates it for others.

5. Motivation and game state effects. In the final group match, a team already qualified might rest starters. A team needing a draw for 4 points (virtually guaranteed advancement as a third-place team) plays differently than a team needing a win. The Poisson model ignores these strategic incentives. Incorporating game-state-dependent lambda adjustments is possible but requires careful calibration to avoid overfitting.

6. Penalty shootout modeling. The model treats shootouts as ~50/50 coin flips. In reality, shootout outcomes correlate with squad quality, goalkeeper ability, and psychological factors. Historical data shows South American teams outperform European teams in shootouts by ~5 percentage points. This is a refinable edge for an agent willing to build a shootout sub-model.

FAQ

How does the 2026 World Cup 48-team format work mathematically?

The 2026 World Cup has 12 groups of 4 teams. Each group plays 6 matches (full round-robin). The top 2 teams per group advance automatically (24 teams), plus the 8 best third-place teams across all 12 groups, yielding a 32-team knockout bracket. Best third-place ranking uses points, then goal difference, then goals scored as tiebreakers. The expansion from 32 to 48 teams increases any individual team’s probability of advancing from the group stage — in the old format, 50% of teams advanced; in the new format, 66.7% advance.

How do you simulate World Cup match outcomes with a Poisson model?

Estimate each team’s expected goals (lambda) from Elo rating differentials: lambda_A = base_rate × 10^((Elo_A - Elo_B) / 800), where base_rate is the tournament average (~1.30 goals per team per match). Then P(team_A scores k goals) = Poisson(k, lambda_A). The joint probability of any scoreline is P(A=i) × P(B=j), assuming independence. Aggregate over all scorelines to get win/draw/loss probabilities. Apply the Dixon-Coles correction for more accurate low-score estimates.

How does home advantage work when three countries host the World Cup?

Historical World Cup data shows host nations receive roughly +100 Elo points equivalent in match prediction models. For the 2026 World Cup co-hosted by USA, Canada, and Mexico, the host advantage applies fully (+100 Elo) when a host nation plays at a venue in their own country. Matches at co-host venues receive a reduced boost (+40 Elo) due to reduced travel and familiar conditions but absence of home-soil crowd energy.

How do you price World Cup outright winner futures from a model?

Run 100,000+ Monte Carlo simulations of the full tournament. Count how many times each team wins the final. The fair probability is wins / total_simulations. Fair decimal odds = 1 / probability. Compare against sportsbook futures odds to find positive expected value. A model giving Brazil 12.3% probability implies fair odds of +713. If BetOnline prices them at +800 (implied 11.1%), the 1.2 percentage point edge justifies a position sized with Kelly.

Why is international soccer harder to model than club soccer?

National teams play 10-15 competitive matches per year versus 38+ league matches for club teams. Parameter estimates are noisy — a team’s Elo rating has wide confidence intervals after so few observations. The solution is Bayesian shrinkage: weight recent international results against club-level priors (average Elo of players’ club teams). Squad turnover between tournaments also makes historical data partially obsolete, requiring manual Elo adjustments when squads are announced.

What’s Next

The Poisson match model and Elo rating system form the foundation of this tournament model. For deeper dives into the components:

Match-level scoring model: Poisson Distribution and Sports Modeling — the full derivation of Poisson goal scoring, Dixon-Coles corrections, and Asian handicap modeling.
Shot-level intelligence: Expected Goals (xG) Betting Model — building xG models from shot data and applying them to international soccer.
Rating system foundation: Elo Ratings and Power Rankings — Elo, Glicko-2, and margin-of-victory adjustments for calibrated win probabilities.
Edge detection pipeline: Odds API Edge Detection Pipeline — pulling live odds and comparing against model output at scale.
Bet sizing: Kelly Criterion — optimal stake sizing once you’ve identified edge in World Cup futures or match markets.

Frequently Asked Questions

How does the 2026 World Cup 48-team format work mathematically?

The 2026 World Cup has 12 groups of 4 teams. Each group plays 3 matches (round-robin). The top 2 teams per group advance automatically (24 teams), plus the 8 best third-place teams across all 12 groups, yielding a 32-team knockout bracket. Best third-place ranking uses points, then goal difference, then goals scored as tiebreakers.

How do you simulate World Cup match outcomes with a Poisson model?

Estimate each team's expected goals (lambda) from Elo rating differentials: lambda_A = base_rate * 10^((Elo_A - Elo_B) / 800). Then P(team_A scores k goals) = Poisson(k, lambda_A). The joint probability of any scoreline is P(A=i) * P(B=j). Aggregate over all scorelines to get win/draw/loss probabilities for each match.

How does home advantage work when three countries host the World Cup?

Historical World Cup data shows host nations receive roughly +100 Elo points equivalent in match prediction models. For the 2026 World Cup co-hosted by USA, Canada, and Mexico, the host advantage applies when a host nation plays at a venue in their own country. The model assigns +100 Elo to the host in home venues and +40 Elo for matches at co-host venues due to reduced travel and familiar conditions.

How do you price World Cup outright winner futures from a model?

Run 100,000+ Monte Carlo simulations of the full tournament. Count how many times each team wins the final. The fair probability is wins / total_simulations. Fair decimal odds = 1 / probability. Compare against sportsbook futures odds to find positive expected value. For example, if your model gives Brazil a 12.3% chance and BetOnline prices them at +1200 (7.7% implied), the edge is 4.6 percentage points.

Why is international soccer harder to model than club soccer?

National teams play 10-15 competitive matches per year versus 38+ league matches for club teams. This means parameter estimates are noisy — a team's Elo rating has wide confidence intervals. The solution is Bayesian shrinkage: weight recent international results against club-level priors (average Elo of players' club teams). Squad turnover between tournaments also makes historical data partially obsolete.