Political prediction markets are the highest-volume category on Polymarket. Model them with two inputs: economic fundamentals (GDP, unemployment, inflation) for the baseline and inverse-variance weighted poll aggregation for the signal. Convert poll margins to win probabilities using a t-distribution (df=4-8), not a normal distribution — elections have fat tails. State-level correlations (rho ~ 0.75 between swing states) mean you must model joint probabilities, not individual races in isolation.
Why This Matters for Agents
Political prediction markets represent the single largest liquidity pool on Polymarket. The 2024 U.S. presidential election markets exceeded $3.5 billion in cumulative trading volume. For autonomous agents, this is the highest-stakes application of Layer 4 — Intelligence in the Agent Betting Stack.
The challenge: political markets behave differently from sports markets. An NFL season produces 272 regular-season games — enough data for rapid model calibration. A presidential election happens once every four years. Sample sizes are small, fundamentals shift slowly, and the information environment is dominated by polls with known biases. An agent that applies sports-betting logic directly to political markets will fail. The math in this guide builds the specialized intelligence module an agent needs for political market pricing — from raw economic data and polls through to a calibrated win probability that feeds into the expected value and Kelly sizing pipeline.
The Math
The Fundamentals Model
Economic conditions predict incumbent party performance. This is one of the most robust empirical findings in political science. The core regression:
V_incumbent = alpha + beta_1 * GDP_growth + beta_2 * unemployment_change + beta_3 * inflation + epsilon
Where V_incumbent is the incumbent party’s two-party vote share, GDP_growth is annualized real GDP growth in Q2 of the election year, unemployment_change is the year-over-year change in unemployment rate, inflation is the year-over-year CPI change, and epsilon is the error term.
Historical coefficients estimated from 1948-2024 presidential elections (n=20):
| Variable | Coefficient | Interpretation |
|---|---|---|
| Intercept (alpha) | 48.5% | Baseline incumbent vote share |
| GDP growth (beta_1) | +0.65 | Each 1pp GDP growth adds ~0.65pp vote share |
| Unemployment change (beta_2) | -1.2 | Each 1pp rise in unemployment costs ~1.2pp |
| Inflation (beta_3) | -0.4 | Each 1pp inflation costs ~0.4pp |
The fundamentals model works well far from the election (6+ months out) when polling is sparse. Its RMSE is approximately 3.5 percentage points on the two-party vote share. As the election approaches, polls become more informative and the model should shift weight toward polling data.
Poll Aggregation: Inverse-Variance Weighting
Raw polls are noisy. A single poll with n=800 likely voters has a margin of error around 3.5 points. Aggregation reduces this noise.
The minimum-variance estimator uses inverse-variance weighting. For poll i with sample size n_i reporting candidate support p_i:
Standard error: sigma_i = sqrt(p_i * (1 - p_i) / n_i)
Weight: w_i = 1 / sigma_i^2
Weighted average: p_hat = sum(w_i * p_i) / sum(w_i)
Aggregate SE: sigma_agg = 1 / sqrt(sum(w_i))
This is the foundation. Every serious aggregation model (FiveThirtyEight, Silver Bulletin, The Economist) starts here, then adds corrections.
Pollster House Effects
Not all pollsters are created equal. Some consistently lean Democratic, others Republican. A house effect is a pollster-specific bias term:
p_observed_ij = p_true_j + h_i + epsilon_ij
Where p_observed_ij is pollster i’s result for race j, p_true_j is the true population parameter, h_i is pollster i’s house effect (constant across races), and epsilon_ij is random sampling error.
Estimate h_i from historical data: compare each pollster’s final pre-election polls to actual results across multiple election cycles. A pollster that consistently overestimates Democrats by 1.5 points gets h_i = +1.5 (assuming positive = Democratic lean).
After estimating house effects, adjust each poll before aggregation:
p_adjusted_ij = p_observed_ij - h_i
From Poll Margin to Win Probability
An aggregated poll margin of +3.2 points does not mean a 100% win probability. The margin must be converted to a probability accounting for forecast uncertainty.
The standard approach uses the Student’s t-distribution, not the normal distribution. Historical presidential election forecast errors are fat-tailed — upsets happen more often than a Gaussian model predicts. The 2016 election demonstrated this: polling averages showed Clinton +3.3 nationally, a result that was within the polling error but would have been a 3-sigma event under a normal model.
P(win) = 1 - t_cdf(0, df=nu, loc=margin, scale=sigma_forecast)
Where margin is the aggregated poll lead, sigma_forecast is the forecast standard deviation (incorporating polling error, fundamentals uncertainty, and systematic error), nu is the degrees of freedom (typically 4-8 based on historical calibration), and t_cdf is the CDF of the Student’s t-distribution.
The degrees of freedom parameter controls tail thickness:
| df | Character | P(win) at +3.0 margin, sigma=3.5 |
|---|---|---|
| 4 | Very fat tails | 74.6% |
| 6 | Moderately fat | 77.8% |
| 10 | Approaching normal | 80.0% |
| infinity | Normal distribution | 80.5% |
A 6-point difference in win probability between df=4 and df=infinity is enormous in a prediction market context — it represents 6 cents of edge on a $1.00 contract.
State-Level Correlation Modeling
Presidential elections are decided state by state through the Electoral College. An agent cannot model each state independently because swing states are correlated.
The joint distribution of state outcomes follows a multivariate normal:
X ~ MVN(mu, Sigma)
Where X is the vector of state-level vote margins, mu is the vector of expected margins (from state polls + fundamentals), and Sigma is the covariance matrix.
The covariance matrix encodes the critical insight: if Pennsylvania shifts 2 points toward a candidate, Michigan and Wisconsin shift approximately 1.5 points in the same direction. Empirical correlations from 2000-2024 presidential elections:
Correlation Matrix (selected swing states):
PA MI WI AZ GA NV
PA 1.00 0.82 0.78 0.55 0.48 0.60
MI 0.82 1.00 0.85 0.50 0.45 0.58
WI 0.78 0.85 1.00 0.48 0.42 0.55
AZ 0.55 0.50 0.48 1.00 0.72 0.68
GA 0.48 0.45 0.42 0.72 1.00 0.62
NV 0.60 0.58 0.55 0.68 0.62 1.00
The Rust Belt states (PA, MI, WI) form a tightly correlated cluster (rho ~ 0.78-0.85). The Sun Belt states (AZ, GA, NV) form another cluster (rho ~ 0.62-0.72). Cross-cluster correlations are moderate (rho ~ 0.42-0.60).
Ignoring these correlations leads to a specific, exploitable error: underestimating the probability of a sweep. If you model states independently, the probability of winning PA AND MI AND WI is P(PA) * P(MI) * P(WI). With correlations, the joint probability is much higher — winning one makes winning the others far more likely.
Worked Examples
Example 1: Fundamentals Model for 2024
Going into Q3 2024, the economic indicators were:
GDP growth (Q2 annualized): 2.8%
Unemployment change (YoY): +0.4pp (from 3.4% to 3.8%)
Inflation (YoY CPI): 3.0%
Plugging into the fundamentals model:
V_incumbent = 48.5 + 0.65(2.8) + (-1.2)(0.4) + (-0.4)(3.0)
V_incumbent = 48.5 + 1.82 - 0.48 - 1.20
V_incumbent = 48.64%
The fundamentals model predicted the incumbent party (Democrats) at 48.64% of the two-party vote — slightly below 50%, suggesting a narrow loss. The actual result was approximately 48.4% — the fundamentals model was within 0.3 points.
On Polymarket in July 2024, Biden YES (before withdrawal) traded at approximately $0.32, and the Democratic nominee field was priced at roughly $0.45. The fundamentals model suggested the market was pricing the Democrats correctly in the range of a coin flip, once you accounted for the candidate-specific uncertainty around Biden’s withdrawal.
Example 2: Poll Aggregation for a Swing State
Consider three Pennsylvania polls from October 2024:
| Pollster | Sample Size | Dem % | Rep % | Margin | House Effect |
|---|---|---|---|---|---|
| Marist | 1,250 | 49.0 | 47.5 | +1.5 | -0.3 (slight R lean) |
| Quinnipiac | 1,100 | 50.2 | 46.8 | +3.4 | +1.2 (D lean) |
| Emerson | 900 | 48.5 | 48.0 | +0.5 | -0.8 (R lean) |
Adjusted margins after removing house effects:
Marist: +1.5 - (-0.3) = +1.8
Quinnipiac: +3.4 - (+1.2) = +2.2
Emerson: +0.5 - (-0.8) = +1.3
Inverse-variance weights (using Dem % as p):
sigma_Marist = sqrt(0.49 * 0.51 / 1250) = 0.01414 → w = 5003
sigma_Quinnipiac = sqrt(0.502 * 0.498 / 1100) = 0.01507 → w = 4400
sigma_Emerson = sqrt(0.485 * 0.515 / 900) = 0.01665 → w = 3607
Weighted average margin:
margin = (5003*1.8 + 4400*2.2 + 3607*1.3) / (5003 + 4400 + 3607)
margin = (9005 + 9680 + 4689) / 13010
margin = 1.80 points
Converting to win probability with sigma_forecast = 3.5 and df = 6:
P(win) = 1 - t.cdf(0, df=6, loc=1.80, scale=3.5)
P(win) = 69.1%
On Polymarket, if the Pennsylvania Democratic YES contract trades at $0.72, the model says it should be $0.69 — a 3-cent edge suggesting the market slightly overprices the Democrat.
Example 3: The Boundary Pricing Problem
Markets near $0.95 systematically underprice tail risk. Consider a market “Will the U.S. hold a presidential election in November 2028?” trading at YES = $0.97 on Kalshi.
Under a normal model, the probability of this not happening is:
P(NO) = 0.03 = 3%
But the t-distribution reveals the issue. Events priced at $0.97 have historically resolved NO approximately 5-8% of the time in political prediction markets — nearly double what the price implies. The fat tails of political uncertainty (constitutional crises, unprecedented events) mean $0.97 contracts are often mispriced.
An agent should apply a boundary correction:
If market_price > 0.93:
adjusted_price = market_price * tail_discount
tail_discount = 1 - (1 - market_price) * tail_multiplier
# tail_multiplier typically 1.5-2.5 for political markets
At $0.97 with tail_multiplier = 2.0:
adjusted_price = 1 - (1 - 0.97) * 2.0 = 1 - 0.06 = 0.94
The agent should treat $0.97 as roughly $0.94 — and shorting YES at $0.97 becomes a positive EV play if the true probability is $0.94.
Implementation
import numpy as np
from scipy import stats
from dataclasses import dataclass
@dataclass
class Poll:
"""A single poll with metadata."""
pollster: str
sample_size: int
dem_pct: float # 0-100
rep_pct: float # 0-100
house_effect: float # positive = D lean, negative = R lean
@dataclass
class ElectionForecast:
"""Output of the election model."""
margin: float # positive = D lead
margin_se: float
win_prob: float
model_type: str # "fundamentals", "polls", "combined"
def fundamentals_model(
gdp_growth: float,
unemployment_change: float,
inflation: float,
alpha: float = 48.5,
beta_gdp: float = 0.65,
beta_unemp: float = -1.2,
beta_inflation: float = -0.4,
rmse: float = 3.5
) -> ElectionForecast:
"""
Predict incumbent party two-party vote share from economic fundamentals.
Args:
gdp_growth: Q2 annualized real GDP growth (%).
unemployment_change: Year-over-year change in unemployment rate (pp).
inflation: Year-over-year CPI inflation (%).
alpha: Intercept (baseline incumbent vote share).
beta_gdp: Coefficient on GDP growth.
beta_unemp: Coefficient on unemployment change.
beta_inflation: Coefficient on inflation.
rmse: Historical root mean squared error of the model.
Returns:
ElectionForecast with incumbent two-party vote share as margin from 50%.
"""
vote_share = (
alpha
+ beta_gdp * gdp_growth
+ beta_unemp * unemployment_change
+ beta_inflation * inflation
)
margin = vote_share - 50.0 # margin relative to 50%
# Win probability via t-distribution (df=6 for fat tails)
win_prob = 1.0 - stats.t.cdf(0, df=6, loc=margin, scale=rmse)
return ElectionForecast(
margin=margin,
margin_se=rmse,
win_prob=win_prob,
model_type="fundamentals"
)
def aggregate_polls(
polls: list[Poll],
sigma_forecast: float = 3.5,
df: int = 6
) -> ElectionForecast:
"""
Aggregate polls using inverse-variance weighting with house effect correction.
Args:
polls: List of Poll objects.
sigma_forecast: Total forecast uncertainty (includes systematic error).
df: Degrees of freedom for t-distribution (4-8 typical).
Returns:
ElectionForecast with aggregated margin and win probability.
"""
weights = []
adjusted_margins = []
for poll in polls:
p = poll.dem_pct / 100.0
sigma_i = np.sqrt(p * (1 - p) / poll.sample_size)
w_i = 1.0 / (sigma_i ** 2)
raw_margin = poll.dem_pct - poll.rep_pct
adjusted_margin = raw_margin - poll.house_effect
weights.append(w_i)
adjusted_margins.append(adjusted_margin)
weights = np.array(weights)
adjusted_margins = np.array(adjusted_margins)
# Inverse-variance weighted average
margin = np.sum(weights * adjusted_margins) / np.sum(weights)
margin_se = 1.0 / np.sqrt(np.sum(weights))
# Win probability via t-distribution
win_prob = 1.0 - stats.t.cdf(0, df=df, loc=margin, scale=sigma_forecast)
return ElectionForecast(
margin=margin,
margin_se=margin_se,
win_prob=win_prob,
model_type="polls"
)
def combined_forecast(
fundamentals: ElectionForecast,
polls: ElectionForecast,
days_to_election: int
) -> ElectionForecast:
"""
Combine fundamentals and polls forecasts. Weight shifts from fundamentals
to polls as election approaches.
Args:
fundamentals: Output of fundamentals_model().
polls: Output of aggregate_polls().
days_to_election: Days until election day.
Returns:
Combined ElectionForecast.
"""
# Sigmoid-based weighting: at 180+ days, fundamentals dominate
# At 0 days, polls dominate
polls_weight = 1.0 / (1.0 + np.exp(0.03 * (days_to_election - 90)))
fund_weight = 1.0 - polls_weight
combined_margin = (
fund_weight * fundamentals.margin
+ polls_weight * polls.margin
)
# Combined SE (assume independence between model errors)
combined_se = np.sqrt(
(fund_weight * fundamentals.margin_se) ** 2
+ (polls_weight * polls.margin_se) ** 2
)
win_prob = 1.0 - stats.t.cdf(0, df=6, loc=combined_margin, scale=combined_se)
return ElectionForecast(
margin=combined_margin,
margin_se=combined_se,
win_prob=win_prob,
model_type="combined"
)
def simulate_electoral_college(
state_margins: dict[str, float],
state_evs: dict[str, int],
correlation_matrix: np.ndarray,
sigma: float = 3.5,
n_simulations: int = 50_000,
ev_threshold: int = 270
) -> dict:
"""
Monte Carlo simulation of Electoral College outcomes using correlated
state-level results.
Args:
state_margins: Dict of state abbreviation -> expected Dem margin (pp).
state_evs: Dict of state abbreviation -> electoral votes.
correlation_matrix: Correlation matrix for states (same order as state_margins).
sigma: Per-state forecast standard deviation.
n_simulations: Number of Monte Carlo draws.
ev_threshold: Electoral votes needed to win (270 for president).
Returns:
Dict with win_prob, mean_ev, ev_distribution percentiles.
"""
states = list(state_margins.keys())
n_states = len(states)
means = np.array([state_margins[s] for s in states])
sigmas = np.full(n_states, sigma)
# Build covariance matrix from correlation matrix and sigmas
cov_matrix = np.outer(sigmas, sigmas) * correlation_matrix
# Draw correlated state outcomes
rng = np.random.default_rng(42)
draws = rng.multivariate_normal(means, cov_matrix, size=n_simulations)
# Count electoral votes won (margin > 0 = Dem win)
ev_counts = np.zeros(n_simulations)
ev_array = np.array([state_evs[s] for s in states])
for i in range(n_simulations):
wins = draws[i] > 0
ev_counts[i] = np.sum(ev_array[wins])
win_prob = np.mean(ev_counts >= ev_threshold)
return {
"win_prob": float(win_prob),
"mean_ev": float(np.mean(ev_counts)),
"median_ev": float(np.median(ev_counts)),
"ev_p10": float(np.percentile(ev_counts, 10)),
"ev_p90": float(np.percentile(ev_counts, 90)),
"sweep_prob": float(np.mean(ev_counts >= 350)),
"blowout_loss_prob": float(np.mean(ev_counts < 200)),
"n_simulations": n_simulations
}
def boundary_correction(
market_price: float,
tail_multiplier: float = 2.0,
threshold: float = 0.93
) -> float:
"""
Adjust market prices near boundaries (>0.93 or <0.07) to account for
systematic underpricing of tail risk in political markets.
Args:
market_price: Raw market price (0 to 1).
tail_multiplier: How much to inflate tail probability (1.5-2.5 typical).
threshold: Price above which correction is applied.
Returns:
Adjusted probability estimate.
"""
if market_price > threshold:
tail_prob = (1.0 - market_price) * tail_multiplier
return 1.0 - min(tail_prob, 1.0 - threshold)
elif market_price < (1.0 - threshold):
tail_prob = market_price * tail_multiplier
return min(tail_prob, 1.0 - threshold)
return market_price
# --- Demo ---
if __name__ == "__main__":
# Fundamentals model: 2024 economic data
fund = fundamentals_model(
gdp_growth=2.8,
unemployment_change=0.4,
inflation=3.0
)
print(f"Fundamentals Model:")
print(f" Incumbent margin: {fund.margin:+.2f}pp")
print(f" Win probability: {fund.win_prob:.1%}\n")
# Poll aggregation: three PA polls
pa_polls = [
Poll("Marist", 1250, 49.0, 47.5, -0.3),
Poll("Quinnipiac", 1100, 50.2, 46.8, 1.2),
Poll("Emerson", 900, 48.5, 48.0, -0.8),
]
poll_forecast = aggregate_polls(pa_polls, sigma_forecast=3.5, df=6)
print(f"Poll Aggregation (PA):")
print(f" Adjusted margin: {poll_forecast.margin:+.2f}pp")
print(f" Win probability: {poll_forecast.win_prob:.1%}\n")
# Combined forecast at 30 days out
combined = combined_forecast(fund, poll_forecast, days_to_election=30)
print(f"Combined (30 days out):")
print(f" Margin: {combined.margin:+.2f}pp")
print(f" Win probability: {combined.win_prob:.1%}\n")
# Electoral College simulation (simplified: 6 swing states)
swing_states = {
"PA": 19, "MI": 15, "WI": 10,
"AZ": 11, "GA": 16, "NV": 6
}
swing_margins = {
"PA": 1.8, "MI": 2.1, "WI": 1.5,
"AZ": -0.5, "GA": -1.2, "NV": 0.8
}
# Correlation matrix matching the empirical values
corr = np.array([
[1.00, 0.82, 0.78, 0.55, 0.48, 0.60], # PA
[0.82, 1.00, 0.85, 0.50, 0.45, 0.58], # MI
[0.85, 0.85, 1.00, 0.48, 0.42, 0.55], # WI
[0.55, 0.50, 0.48, 1.00, 0.72, 0.68], # AZ
[0.48, 0.45, 0.42, 0.72, 1.00, 0.62], # GA
[0.60, 0.58, 0.55, 0.68, 0.62, 1.00], # NV
])
# Note: This simulates only swing state EVs (77 total).
# A full model adds safe states to reach 538.
ec_result = simulate_electoral_college(
swing_margins, swing_states, corr,
sigma=3.5, n_simulations=50_000, ev_threshold=39
# 39 of 77 swing EVs = majority of swing states
)
print(f"Electoral College Sim (swing states only):")
print(f" Win prob (majority of swing EVs): {ec_result['win_prob']:.1%}")
print(f" Mean swing EVs won: {ec_result['mean_ev']:.0f} / 77")
print(f" 90% interval: [{ec_result['ev_p10']:.0f}, {ec_result['ev_p90']:.0f}]")
# Boundary correction
print(f"\nBoundary Corrections:")
for price in [0.95, 0.97, 0.99]:
adj = boundary_correction(price, tail_multiplier=2.0)
print(f" Market ${price:.2f} -> Adjusted ${adj:.2f} (edge: {price - adj:.2f})")
Limitations and Edge Cases
Small sample sizes for fundamentals. The fundamentals regression uses 20 data points (elections since 1948). With 3 predictors, the effective degrees of freedom are low. Coefficient estimates are unstable — adding or removing a single election can shift beta_gdp by 0.2. Treat the fundamentals model as a prior, not a precise forecast.
Herding in polls. Pollsters see each other’s results. Late-cycle polls converge toward the consensus, not because the race has stabilized but because pollsters adjust their likely voter screens to avoid being the outlier. This artificial consensus understates true uncertainty. An agent should inflate sigma_forecast by 10-20% in the final two weeks before an election.
Unprecedented candidates and events. The fundamentals model assumes normal elections. Biden dropping out in July 2024 is not in the training data. Candidate-specific shocks (health events, criminal convictions, third-party surges) break the model. An agent should have a circuit breaker: if a major structural event occurs, revert to a wider prior and upweight polls.
Correlation instability. The state correlation matrix estimated from 2000-2024 may not hold for future elections. Demographic realignment (rural-urban polarization accelerating, Sun Belt diversification) changes correlation structure between cycles. Use the historical matrix as a starting point, but an agent should Bayesian-update correlations as early state-level polls arrive.
Liquidity traps near expiry. Political prediction markets become illiquid in the final 24-48 hours before resolution. Spreads widen to $0.05-$0.10, and the midpoint no longer reflects true probability. An agent should stop trading or switch to aggressive limit orders well before market close.
FAQ
How do you model elections for prediction markets?
Two primary approaches: fundamentals models use economic indicators (GDP growth, unemployment, inflation) regressed against historical incumbent vote share, and polls-based models aggregate polling data using inverse-variance weighting. The best models combine both, typically weighting fundamentals more heavily far from the election and shifting toward polls as election day approaches.
Why use the t-distribution instead of normal distribution for election probabilities?
Election outcomes have fatter tails than a Gaussian distribution predicts. Historical presidential forecast errors follow a t-distribution with roughly 4-8 degrees of freedom. Using a normal distribution underestimates the probability of upsets — a 3-point polling lead converts to roughly 80% win probability under a normal model but only about 75% under a t-distribution with df=5 and sigma=3.5.
How do state-level correlations affect presidential election modeling?
Swing states are highly correlated — if Pennsylvania shifts 2 points toward a candidate, Michigan and Wisconsin typically shift approximately 1.5 points in the same direction (correlation approximately 0.75-0.85). Ignoring this correlation dramatically underestimates the probability of a candidate sweeping or losing all swing states simultaneously. A multivariate normal model with a correlation matrix captures this dependency.
What is inverse-variance weighting for poll aggregation?
Each poll is weighted by the inverse of its variance: w_i = 1 / sigma_i^2, where sigma_i = sqrt(p*(1-p)/n) for a poll with sample size n. Larger polls with smaller margins of error receive more weight. This produces a minimum-variance estimate of the true population parameter and is the foundation of every serious poll aggregation model including FiveThirtyEight and Silver Bulletin.
How do political prediction markets differ from sports betting markets?
Political markets have lower event frequency (one presidential election every four years vs. thousands of games per season), higher information asymmetry (internal campaign polls, ground-game data), longer time horizons that lock capital, and systematic boundary pricing errors near $0.95-$0.99 where tail risk is underpriced. Agents need different strategies — see the Bayesian updating guide for belief revision as new information arrives.
What’s Next
Political markets are the highest-stakes application of the intelligence layer. The math here feeds directly into position sizing and risk management.
- Bayesian belief updating: The Bayesian Updating for Prediction Markets guide covers how an agent should revise its election model probability when a new poll drops or an economic indicator is released.
- Monte Carlo for position sizing: The Monte Carlo Simulation guide extends the Electoral College simulation to portfolio-level risk assessment across multiple correlated political markets.
- Pulling live prices: The Prediction Market API Reference documents the Polymarket and Kalshi endpoints for retrieving live election market prices.
- Calibrating your model: The Calibration and Model Evaluation guide shows how to measure whether your election model’s probabilities match observed frequencies.
- Offshore sportsbook political lines: Some offshore sportsbooks offer political prop bets with different vig structures — compare against prediction market prices for cross-platform edge using the Arbitrage Calculator.
