Eight steps from raw odds to executed trade: ingest, convert, model, calculate edge (p_model - p_market), filter, Kelly-size, correlation-check, execute. This guide wires together every formula in the series into one runnable pipeline. Edge = model probability minus market probability. If edge > threshold and Kelly says bet, the agent bets.
Why This Matters for Agents
This is the capstone. Every other guide in the Math Behind Betting series derives a formula, proves a theorem, or builds a model component. This guide wires them together into a production pipeline that takes raw odds as input and outputs sized, correlated, executable trade signals.
The pipeline spans all four layers of the Agent Betting Stack. Layer 1 (Access) handles API connections to The Odds API, Polymarket, and Kalshi. Layer 2 (Wallet) enforces bankroll limits, position sizing, and drawdown guards. Layer 3 (Trading) converts signals into orders routed to the correct book. Layer 4 (Intelligence) runs model predictions, edge calculations, and the CLV feedback loop that tells the agent whether its models are actually working. No single layer operates in isolation — the pipeline is the integration layer that connects them.
An agent without a pipeline is a collection of disconnected functions. An agent with a pipeline is a system that turns data into money.
The Math
Pipeline Architecture
Step 1: INGEST Step 2: CONVERT Step 3: MODEL
┌─────────────┐ ┌──────────────────┐ ┌──────────────────┐
│ The Odds API │─────▶│ American → Prob │─────▶│ Elo + Poisson + │
│ Polymarket │ │ Shin vig removal │ │ Regression │
│ Kalshi │ │ Fee adjustment │ │ Ensemble weight │
└─────────────┘ └──────────────────┘ └──────────────────┘
│
Step 8: FEEDBACK Step 7: EXECUTE Step 4: EDGE
┌──────────────────┐ ┌──────────────────┐ ┌──────────────────┐
│ CLV tracking │◀─│ Route to best │◀─────│ edge_i = │
│ Brier calibrate │ │ book per signal │ │ p_model - p_mkt │
│ Model retrain │ │ Layer 3 execute │ │ per outcome │
└──────────────────┘ └──────────────────┘ └──────────────────┘
▲ │
Step 6: PORTFOLIO Step 5: FILTER+SIZE
┌──────────────────┐ ┌──────────────────┐
│ Correlation chk │◀─────│ edge > threshold │
│ Max heat 15% │ │ Kelly f* sizing │
│ Max position 5% │ │ Quarter-Kelly │
└──────────────────┘ └──────────────────┘
Step 1: Odds Ingestion
The Odds API aggregates lines from 20+ sportsbooks in a single call. Polymarket and Kalshi require separate integrations. An agent polls all three on a cadence — every 30-60 seconds for live markets, every 5-15 minutes for pre-game.
The raw output is a mess: American odds from sportsbooks (-110, +250), decimal prices from Polymarket ($0.63), cent prices from Kalshi (63). Step 2 normalizes everything.
Step 2: Probability Conversion and Vig Removal
Three conversion formulas, depending on source:
American odds to implied probability:
If odds < 0 (favorite): p = |odds| / (|odds| + 100)
If odds > 0 (underdog): p = 100 / (odds + 100)
Example: -150 → 150/250 = 0.600. +130 → 100/230 = 0.435. Sum = 1.035, so the overround is 3.5%.
Decimal odds to implied probability:
p = 1 / decimal_odds
Prediction market prices are already probabilities (after fee adjustment). See Prediction Market Math 101 for the derivation.
Vig Removal: Shin’s Method
Raw implied probabilities from sportsbooks sum to more than 1.0. The excess is the vig. Naive multiplicative removal (divide each by the sum) distributes vig uniformly across outcomes. That assumption is wrong — sportsbooks shade favorites more than longshots.
Shin’s method models vig as a function of an insider trading fraction z. The true probability for outcome i is:
p_true_i = (sqrt(z^2 + 4*(1-z) * pi_raw_i * (1/n)) - z) / (2 * (1-z))
Wait — let me state the precise formulation. Given n outcomes with raw implied probabilities pi_raw_i (which sum to 1 + overround), Shin’s method solves for the insider fraction z such that:
Sum_i [ (sqrt(z^2 + 4*(1-z) * pi_raw_i / sum_pi) - z) / (2*(1-z)) ] = 1.0
where sum_pi = Sum_i(pi_raw_i). This is solved numerically. The key advantage: Shin’s method compresses longshot probabilities less than favorite probabilities, matching observed sportsbook behavior. The Sports Betting Math 101 guide covers the derivation in detail. The AgentBets Vig Index publishes live overround data per sportsbook.
Step 3: Model Prediction
The agent needs its own probability estimate for each outcome. The series covers three core model families:
Elo ratings — team/player strength from head-to-head results. Convert Elo difference to win probability via the logistic function: p = 1 / (1 + 10^(-delta/400)). See Elo Ratings and Power Rankings.
Poisson models — project score totals from average goals/points per game, then derive outcome probabilities from the joint Poisson distribution. See Poisson Distribution and Sports Modeling.
Logistic regression — binary outcome classification using feature vectors (home advantage, rest days, injuries, weather). See Regression Models for Sports Betting.
For prediction markets, Bayesian updating from news events and poll data replaces sports-specific models. Polyseer implements multi-agent Bayesian aggregation for this purpose.
Ensemble Weighting
A single model is fragile. Combining models with inverse-variance weighting produces more calibrated estimates:
p_ensemble = Sum_i(w_i * p_i) / Sum_i(w_i)
where w_i = 1 / var_i
var_i is the historical variance of model i’s probability estimates against realized outcomes. A model with lower variance (better calibrated) gets higher weight. See Calibration and Model Evaluation for measuring model variance with Brier decomposition.
Step 4: Edge Calculation
Edge is the gap between what the agent believes and what the market believes:
edge_i = p_model_i - p_market_i
For a bet on outcome i at decimal odds d_i, the expected value is:
EV_i = p_model_i * d_i - 1
EV > 0 is equivalent to edge > 0 when the odds are fair. See Expected Value for Prediction Market Agents for the full EV framework.
The agent computes edge for every outcome on every book simultaneously. A single NFL game with 8 sportsbooks and 3 outcomes (home, away, draw — wait, NFL doesn’t have draws; home spread, away spread) produces 16+ edge calculations. Across a full Sunday slate, that is hundreds of signals.
Step 5: Edge Filter and Kelly Sizing
Not every positive-edge signal is tradeable. The agent applies two filters:
Minimum edge threshold: Sportsbook bets require edge > 2% to cover execution friction (line movement during placement, potential limit reduction). Prediction market bets require edge > 1% due to lower friction.
Kelly criterion for position sizing:
f* = edge / (decimal_odds - 1)
where edge = p_model - p_market. Production agents use quarter-Kelly:
f_actual = f* / 4
Quarter-Kelly reduces bankroll growth by roughly 44% compared to full Kelly but cuts maximum drawdown by more than half. The Kelly Criterion guide proves this tradeoff. The Drawdown Math guide quantifies the variance reduction.
Step 6: Portfolio Correlation Check
Individual Kelly-sized bets can aggregate into dangerous portfolio-level risk if outcomes are correlated. An agent betting on Lakers -3.5 and Over 220.5 in the same game has correlated positions — if the Lakers win big, both bets are more likely to hit.
The agent must check:
Pairwise correlation between active positions using historical outcome co-occurrence. See Correlation and Portfolio Theory for Multi-Market Agents.
Maximum portfolio heat — total bankroll at risk across all open positions. Cap at 15% of bankroll.
Maximum single-position size — cap at 5% of bankroll regardless of Kelly output.
Same-game parlay risk — correlated legs compound risk non-linearly. See Correlation Risk in Parlays.
If adding a new position would breach any limit, the agent either reduces the position size or skips the signal entirely.
Step 7: Execution Routing
The agent routes each signal to the book offering the best price:
- Polymarket: Execute via py-clob-client limit orders on the Polygon CLOB
- Kalshi: Execute via Kalshi REST API market/limit orders
- Sportsbooks: Execute via offshore sportsbook APIs where available (see offshore sportsbook API hub) or manual placement for books without API access
Best execution means: place the order on the book where p_market is lowest (highest edge) for the target outcome. If BetOnline has Lakers -3.5 at -108 and Bovada has it at -112, route to BetOnline. The Arbitrage Detection guide covers cross-book price comparison in depth.
Step 8: CLV Feedback Loop
After execution, the agent tracks two metrics:
Closing Line Value (CLV): Did the agent beat the closing price? If it bought YES at $0.52 and the market closed at $0.58, that is +6 cents of CLV. Consistent positive CLV is the strongest signal of real edge. See Closing Line Value.
Model calibration: Run Brier score decomposition on the model’s probability outputs versus realized outcomes. A well-calibrated model’s 70% predictions should resolve YES roughly 70% of the time. See Calibration and Model Evaluation.
The feedback loop closes when calibration metrics trigger model retraining. Feature Engineering for Sports Prediction covers which features to add or drop based on predictive signal strength.
Worked Examples
Example 1: NFL Moneyline Edge Detection
The agent pulls NFL Week 12 moneyline odds from The Odds API for Eagles vs. Cowboys:
BetOnline: Eagles -185, Cowboys +155
Bovada: Eagles -180, Cowboys +150
BookMaker: Eagles -175, Cowboys +155
Pinnacle: Eagles -178, Cowboys +158
Step 2 — Convert and remove vig (Shin’s method on Pinnacle as benchmark):
Raw implied: Eagles = 178/278 = 0.6403, Cowboys = 100/258 = 0.3876. Sum = 1.0279, overround = 2.79%.
After Shin’s vig removal (solved numerically): Eagles true probability = 0.6258, Cowboys = 0.3742.
Step 3 — Model prediction:
The agent’s Elo model: Eagles 0.64, Cowboys 0.36. Poisson model: Eagles 0.61, Cowboys 0.39. Logistic regression: Eagles 0.63, Cowboys 0.37.
Inverse-variance ensemble (Elo var=0.04, Poisson var=0.06, Logistic var=0.05):
- w_elo = 25, w_poisson = 16.67, w_logistic = 20
- p_eagles = (250.64 + 16.670.61 + 20*0.63) / 61.67 = 0.6283
Step 4 — Edge:
Best available line: BookMaker Eagles at -175 → implied 0.6364, Shin-adjusted 0.6210.
edge = 0.6283 - 0.6210 = 0.0073 (0.73%)
Step 5 — Filter: Edge 0.73% < 2% threshold. Signal rejected. The agent does not bet this game.
Example 2: Polymarket Event with Detectable Edge
Market: “Will the Fed cut rates at June 2026 FOMC?” YES trading at $0.41 bid / $0.43 ask on Polymarket.
Step 2: Midpoint = $0.42. Fee-adjusted = 0.42 / (1 - 0.02 * 0.58) = 0.42 / 0.9884 = 0.4249.
Step 3: Agent’s Bayesian model, updated after this morning’s CPI print (core CPI at 2.1%, below consensus 2.4%), estimates P(cut) = 0.54.
Step 4: edge = 0.54 - 0.4249 = 0.1151 (11.5%).
Step 5: Edge 11.5% » 1% threshold. Pass.
Decimal odds for YES at $0.42 = 1/0.42 = 2.381. Kelly: f* = 0.1151 / (2.381 - 1) = 0.0833. Quarter-Kelly: f = 0.0208 (2.08% of bankroll).
With $10,000 bankroll: bet size = $208 on YES at $0.42.
Step 6: Check correlation with existing positions. Agent already holds $150 on “Core PCE below 2.5% in Q2 2026” — these outcomes are correlated (both benefit from cooling inflation). Correlation coefficient estimated at 0.65. Reduce new position by correlation factor: $208 * (1 - 0.65*150/10000) = $206. Minimal adjustment because existing position is small relative to bankroll.
Step 7: Execute limit buy of 495 YES contracts at $0.42 on Polymarket via py-clob-client.
Step 8: Two weeks later, additional dovish data lands. Market moves to YES $0.58. CLV = +$0.16. The model was right — and the closing line confirms it.
Example 3: Cross-Platform Arbitrage
Agent detects a pricing gap between Kalshi and a sportsbook for “Total points in Super Bowl over/under 49.5”:
Kalshi: Over 49.5 YES at 54¢ (implied 0.54)
BetOnline: Under 49.5 at -105 (implied 0.5122)
Combined cost: 0.54 + 0.5122 = 1.0522. No arb — the sum exceeds 1.0.
But wait — check BetOnline’s Over 49.5 at +100 (implied 0.50) against Kalshi’s Under YES at 44¢ (implied 0.44):
Combined cost: 0.50 + 0.44 = 0.94. Arb exists. Guaranteed profit = 1.0 - 0.94 = $0.06 per dollar.
Route: buy Over at +100 on BetOnline, buy Under YES at 44¢ on Kalshi. The Arbitrage Calculator computes optimal capital allocation between legs.
Implementation
"""
End-to-end edge detection pipeline.
Requires: pip install numpy scipy pandas requests
"""
import numpy as np
from scipy.optimize import brentq
from dataclasses import dataclass, field
from typing import Optional
import pandas as pd
# ─────────────────────────────────────────────
# Step 1 & 2: Odds Ingestion and Conversion
# ─────────────────────────────────────────────
def american_to_implied(odds: int) -> float:
"""Convert American odds to raw implied probability."""
if odds < 0:
return abs(odds) / (abs(odds) + 100)
else:
return 100 / (odds + 100)
def decimal_to_implied(odds: float) -> float:
"""Convert decimal odds to raw implied probability."""
return 1.0 / odds
def implied_to_decimal(prob: float) -> float:
"""Convert implied probability to decimal odds."""
if prob <= 0:
return float('inf')
return 1.0 / prob
def shin_remove_vig(raw_probs: np.ndarray) -> np.ndarray:
"""
Remove vig using Shin's method.
Solves for insider fraction z such that adjusted probabilities
sum to 1.0. More accurate than multiplicative normalization
for sportsbook markets with favorite-longshot bias.
Parameters:
raw_probs: array of raw implied probabilities (sum > 1.0)
Returns:
array of true probabilities (sum = 1.0)
"""
n = len(raw_probs)
total = np.sum(raw_probs)
if np.isclose(total, 1.0, atol=0.001):
return raw_probs
normalized = raw_probs / total
def shin_equation(z: float) -> float:
adjusted = np.array([
(np.sqrt(z**2 + 4 * (1 - z) * p * (1.0 / total)) - z)
/ (2 * (1 - z))
for p in raw_probs
])
return np.sum(adjusted) - 1.0
# z is the insider trading fraction, typically 0 < z < 0.1
try:
z_star = brentq(shin_equation, 1e-10, 0.5)
except ValueError:
# Fallback to multiplicative if Shin's doesn't converge
return raw_probs / total
true_probs = np.array([
(np.sqrt(z_star**2 + 4 * (1 - z_star) * p * (1.0 / total)) - z_star)
/ (2 * (1 - z_star))
for p in raw_probs
])
return true_probs
def polymarket_fee_adjust(midpoint: float, fee_rate: float = 0.02) -> float:
"""Adjust Polymarket midpoint for winner fee."""
if 0 < midpoint < 1:
return midpoint / (1 - fee_rate * (1 - midpoint))
return midpoint
# ─────────────────────────────────────────────
# Step 3: Model Ensemble
# ─────────────────────────────────────────────
@dataclass
class ModelOutput:
"""Output from a single prediction model."""
name: str
probability: float
variance: float # historical prediction variance
def ensemble_predict(models: list[ModelOutput]) -> float:
"""
Inverse-variance weighted ensemble.
Models with lower historical variance (better calibration)
receive higher weight.
Parameters:
models: list of ModelOutput with probability and variance
Returns:
ensemble probability estimate
"""
weights = np.array([1.0 / m.variance for m in models])
probs = np.array([m.probability for m in models])
return float(np.average(probs, weights=weights))
# ─────────────────────────────────────────────
# Step 4: Edge Calculation
# ─────────────────────────────────────────────
@dataclass
class EdgeSignal:
"""A detected edge opportunity."""
event: str
outcome: str
book: str
market_prob: float
model_prob: float
edge: float
decimal_odds: float
ev_per_dollar: float
def calculate_edges(
event: str,
outcome: str,
book_probs: dict[str, float],
model_prob: float,
) -> list[EdgeSignal]:
"""
Calculate edge for one outcome across all books.
Parameters:
event: event description
outcome: outcome name (e.g., "Eagles ML")
book_probs: dict of book_name -> market implied probability
model_prob: agent's model probability
Returns:
list of EdgeSignal sorted by edge descending
"""
signals = []
for book, mkt_prob in book_probs.items():
edge = model_prob - mkt_prob
dec_odds = implied_to_decimal(mkt_prob)
ev = model_prob * dec_odds - 1.0
signals.append(EdgeSignal(
event=event,
outcome=outcome,
book=book,
market_prob=mkt_prob,
model_prob=model_prob,
edge=edge,
decimal_odds=dec_odds,
ev_per_dollar=ev,
))
signals.sort(key=lambda s: s.edge, reverse=True)
return signals
# ─────────────────────────────────────────────
# Step 5: Filter and Kelly Sizing
# ─────────────────────────────────────────────
@dataclass
class SizedBet:
"""A filtered and sized bet ready for portfolio check."""
signal: EdgeSignal
kelly_fraction: float
quarter_kelly: float
bet_amount: float
def filter_and_size(
signals: list[EdgeSignal],
bankroll: float,
min_edge_sportsbook: float = 0.02,
min_edge_predmarket: float = 0.01,
kelly_fraction: float = 0.25,
max_position_pct: float = 0.05,
is_prediction_market: bool = False,
) -> list[SizedBet]:
"""
Filter by minimum edge, size with fractional Kelly.
Parameters:
signals: list of EdgeSignal (typically from one outcome, multiple books)
bankroll: current bankroll in dollars
min_edge_sportsbook: minimum edge for sportsbook bets
min_edge_predmarket: minimum edge for prediction market bets
kelly_fraction: Kelly multiplier (0.25 = quarter-Kelly)
max_position_pct: maximum single position as fraction of bankroll
is_prediction_market: True for Polymarket/Kalshi signals
Returns:
list of SizedBet passing all filters
"""
threshold = min_edge_predmarket if is_prediction_market else min_edge_sportsbook
sized = []
for sig in signals:
if sig.edge < threshold:
continue
if sig.decimal_odds <= 1.0:
continue
# Kelly: f* = edge / (odds - 1)
full_kelly = sig.edge / (sig.decimal_odds - 1.0)
frac_kelly = full_kelly * kelly_fraction
# Cap at max position size
frac_kelly = min(frac_kelly, max_position_pct)
bet_amt = bankroll * frac_kelly
sized.append(SizedBet(
signal=sig,
kelly_fraction=full_kelly,
quarter_kelly=frac_kelly,
bet_amount=round(bet_amt, 2),
))
return sized
# ─────────────────────────────────────────────
# Step 6: Portfolio Correlation Check
# ─────────────────────────────────────────────
@dataclass
class Position:
"""An existing or proposed portfolio position."""
event: str
outcome: str
amount: float
correlation_group: str # e.g., "NFL_week12", "Fed_June"
def portfolio_check(
new_bet: SizedBet,
existing_positions: list[Position],
correlation_matrix: dict[tuple[str, str], float],
bankroll: float,
max_heat_pct: float = 0.15,
) -> tuple[float, bool]:
"""
Check portfolio constraints and adjust position size.
Parameters:
new_bet: proposed bet
existing_positions: current open positions
correlation_matrix: pairwise correlation between event groups
bankroll: current bankroll
max_heat_pct: maximum total exposure as fraction of bankroll
Returns:
(adjusted_amount, is_approved)
"""
# Current total exposure
current_heat = sum(p.amount for p in existing_positions)
max_heat = bankroll * max_heat_pct
# Room for new position
room = max_heat - current_heat
if room <= 0:
return 0.0, False
# Adjust for correlation with existing positions
adjusted_amount = new_bet.bet_amount
new_group = new_bet.signal.event
for pos in existing_positions:
corr_key = (new_group, pos.correlation_group)
reverse_key = (pos.correlation_group, new_group)
corr = correlation_matrix.get(corr_key,
correlation_matrix.get(reverse_key, 0.0))
if corr > 0.3:
# Reduce new position proportional to correlated exposure
reduction = corr * (pos.amount / bankroll)
adjusted_amount *= (1 - reduction)
# Cap at remaining room
adjusted_amount = min(adjusted_amount, room)
return round(adjusted_amount, 2), adjusted_amount > 0
# ─────────────────────────────────────────────
# Step 8: CLV and Calibration Tracking
# ─────────────────────────────────────────────
@dataclass
class TradeRecord:
"""Completed trade for feedback analysis."""
event: str
outcome: str
entry_prob: float
closing_prob: float
model_prob: float
result: int # 1 = win, 0 = loss
def compute_clv(trades: list[TradeRecord]) -> pd.DataFrame:
"""
Compute CLV statistics across all trades.
Returns DataFrame with per-trade CLV and summary stats.
"""
records = []
for t in trades:
clv = t.closing_prob - t.entry_prob
records.append({
"event": t.event,
"entry_prob": t.entry_prob,
"closing_prob": t.closing_prob,
"clv": clv,
"clv_pct": clv * 100,
"result": t.result,
})
df = pd.DataFrame(records)
return df
def brier_score(predictions: np.ndarray, outcomes: np.ndarray) -> float:
"""
Brier score: mean squared error of probability predictions.
Lower is better. Perfect = 0.0, coin flip = 0.25.
Parameters:
predictions: array of predicted probabilities
outcomes: array of binary outcomes (0 or 1)
Returns:
Brier score
"""
return float(np.mean((predictions - outcomes) ** 2))
def calibration_bins(
predictions: np.ndarray,
outcomes: np.ndarray,
n_bins: int = 10,
) -> pd.DataFrame:
"""
Bin predictions and compare predicted vs actual frequency.
A perfectly calibrated model has predicted_mean == actual_mean
in every bin.
"""
bin_edges = np.linspace(0, 1, n_bins + 1)
bins = np.digitize(predictions, bin_edges) - 1
bins = np.clip(bins, 0, n_bins - 1)
rows = []
for i in range(n_bins):
mask = bins == i
if mask.sum() == 0:
continue
rows.append({
"bin_lower": bin_edges[i],
"bin_upper": bin_edges[i + 1],
"count": int(mask.sum()),
"predicted_mean": float(predictions[mask].mean()),
"actual_mean": float(outcomes[mask].mean()),
"gap": float(predictions[mask].mean() - outcomes[mask].mean()),
})
return pd.DataFrame(rows)
# ─────────────────────────────────────────────
# Full Pipeline Runner
# ─────────────────────────────────────────────
def run_pipeline_example():
"""
Demonstrate the full pipeline with sample data.
"""
print("=" * 60)
print("EDGE DETECTION PIPELINE — SAMPLE RUN")
print("=" * 60)
# Step 1: Raw odds (simulating The Odds API response)
odds_data = {
"event": "Eagles vs Cowboys — Moneyline",
"books": {
"BetOnline": {"Eagles": -185, "Cowboys": 155},
"Bovada": {"Eagles": -180, "Cowboys": 150},
"BookMaker": {"Eagles": -175, "Cowboys": 155},
"Pinnacle": {"Eagles": -178, "Cowboys": 158},
}
}
# Step 2: Convert and remove vig
print("\n--- Step 2: Vig Removal (Shin's Method) ---")
all_book_probs = {}
for book, lines in odds_data["books"].items():
raw = np.array([american_to_implied(lines["Eagles"]),
american_to_implied(lines["Cowboys"])])
true = shin_remove_vig(raw)
all_book_probs[book] = {
"Eagles": float(true[0]),
"Cowboys": float(true[1]),
}
print(f" {book:12s}: raw [{raw[0]:.4f}, {raw[1]:.4f}] "
f"sum={raw.sum():.4f} → shin [{true[0]:.4f}, {true[1]:.4f}]")
# Step 3: Model ensemble
print("\n--- Step 3: Model Ensemble ---")
models = [
ModelOutput("Elo", 0.64, 0.040),
ModelOutput("Poisson", 0.61, 0.060),
ModelOutput("Logistic", 0.63, 0.050),
]
p_model = ensemble_predict(models)
print(f" Elo: {models[0].probability:.2f} (var={models[0].variance})")
print(f" Poisson: {models[1].probability:.2f} (var={models[1].variance})")
print(f" Logistic: {models[2].probability:.2f} (var={models[2].variance})")
print(f" Ensemble: {p_model:.4f}")
# Step 4: Edge calculation
print("\n--- Step 4: Edge Calculation ---")
eagles_probs = {b: p["Eagles"] for b, p in all_book_probs.items()}
edges = calculate_edges(
odds_data["event"], "Eagles ML", eagles_probs, p_model
)
for e in edges:
print(f" {e.book:12s}: mkt={e.market_prob:.4f} "
f"edge={e.edge:+.4f} EV=${e.ev_per_dollar:+.4f}/dollar")
# Step 5: Filter and size
print("\n--- Step 5: Filter & Kelly Sizing ---")
bankroll = 10000
sized = filter_and_size(edges, bankroll, is_prediction_market=False)
if sized:
for s in sized:
print(f" BET: {s.signal.book} — ${s.bet_amount:.2f} "
f"(Kelly={s.kelly_fraction:.4f}, "
f"1/4 Kelly={s.quarter_kelly:.4f})")
else:
print(" No bets pass the 2% edge threshold. Pipeline exits.")
# Now demonstrate with a prediction market signal
print("\n" + "=" * 60)
print("PREDICTION MARKET SIGNAL")
print("=" * 60)
pm_mid = 0.42
pm_adjusted = polymarket_fee_adjust(pm_mid)
pm_model = 0.54
pm_edge = pm_model - pm_adjusted
print(f"\n Market: Fed June 2026 Rate Cut — YES at ${pm_mid:.2f}")
print(f" Fee-adjusted: {pm_adjusted:.4f}")
print(f" Model prob: {pm_model:.2f}")
print(f" Edge: {pm_edge:.4f} ({pm_edge*100:.1f}%)")
pm_signals = [EdgeSignal(
event="Fed Rate Cut June 2026",
outcome="YES",
book="Polymarket",
market_prob=pm_adjusted,
model_prob=pm_model,
edge=pm_edge,
decimal_odds=implied_to_decimal(pm_mid),
ev_per_dollar=pm_model * implied_to_decimal(pm_mid) - 1.0,
)]
pm_sized = filter_and_size(
pm_signals, bankroll, is_prediction_market=True
)
for s in pm_sized:
print(f"\n BET: {s.signal.book} — ${s.bet_amount:.2f} "
f"(full Kelly={s.kelly_fraction:.4f}, "
f"quarter Kelly={s.quarter_kelly:.4f})")
# Step 8: Simulated CLV tracking
print("\n" + "=" * 60)
print("FEEDBACK LOOP — CLV ANALYSIS")
print("=" * 60)
sample_trades = [
TradeRecord("Fed Cut Jun", "YES", 0.42, 0.58, 0.54, 1),
TradeRecord("Eagles ML", "Eagles", 0.62, 0.64, 0.63, 1),
TradeRecord("Lakers +3.5", "Lakers", 0.48, 0.45, 0.50, 0),
TradeRecord("BTC > 100k Dec", "YES", 0.35, 0.41, 0.40, 1),
TradeRecord("Ohtani HR", "YES", 0.28, 0.31, 0.33, 0),
]
clv_df = compute_clv(sample_trades)
print(f"\n Average CLV: {clv_df['clv'].mean():+.4f} "
f"({clv_df['clv_pct'].mean():+.2f}%)")
print(f" CLV > 0 rate: {(clv_df['clv'] > 0).mean():.0%}")
print(f" Win rate: {clv_df['result'].mean():.0%}")
preds = np.array([t.model_prob for t in sample_trades])
outs = np.array([t.result for t in sample_trades])
bs = brier_score(preds, outs)
print(f" Brier score: {bs:.4f} (lower is better, 0.25 = coin flip)")
if __name__ == "__main__":
run_pipeline_example()
Limitations and Edge Cases
Model risk is the dominant failure mode. The pipeline is only as good as Step 3. If the model’s probability estimates are miscalibrated — consistently overestimating or underestimating — the edge calculation produces false signals. Garbage in, garbage out. The Brier score feedback loop (Step 8) catches this, but only after enough trades to reach statistical significance. See Statistical Significance in Sports Betting for minimum sample sizes — you need 1,000+ bets to distinguish skill from luck at typical edge levels.
Shin’s method assumes a specific market structure. It models the bookmaker’s overround as arising from informed trading. For prediction markets with order-book-based pricing (Polymarket CLOB), Shin’s model doesn’t apply — the vig arises from the bid-ask spread, not from a bookmaker’s margin. Use midpoint estimation for prediction markets and Shin’s method for sportsbooks.
Correlation estimation is hard. Step 6 requires a correlation matrix between outcomes. For same-sport events (two NFL games on the same Sunday), correlation is near zero. For same-game outcomes (moneyline and total), correlation can exceed 0.5. For macro events (Fed rate decisions affecting multiple markets), correlations shift rapidly. Historical correlation is a lagging indicator — regime changes break the matrix.
Execution slippage destroys thin edges. A 2% edge on a sportsbook line evaporates if the line moves 2 points between signal detection and order placement. Latency matters. An agent running this pipeline on a 60-second polling interval will miss edges that a 1-second polling agent captures. For betting bots in production, sub-second execution is table stakes.
API rate limits and account limits. The Odds API free tier allows 500 requests/month. Sportsbooks limit sharp accounts. Polymarket has no account limits but has gas costs on Polygon. An agent must budget API calls and diversify across books to avoid being limited at any single venue. BookMaker is known for higher limits on sharp action; MyBookie limits aggressively.
Overfitting the ensemble. Inverse-variance weighting works when model errors are uncorrelated. If all three models use the same underlying features (recent form, home advantage), their errors correlate and the ensemble provides less diversification than the weights suggest. See Monte Carlo Simulation for Prediction Markets for stress-testing ensemble robustness.
FAQ
How do you build an automated betting pipeline from odds to trade execution?
An automated betting pipeline follows eight steps: ingest odds from APIs, convert to implied probabilities and remove vig, generate model predictions, calculate edge (model probability minus market probability), filter by minimum edge threshold, size bets using Kelly criterion, check portfolio correlation limits, and execute through the appropriate trading interface. Each step has specific math that feeds into the next. The full pipeline spans all four layers of the Agent Betting Stack.
What is Shin’s method for removing vig from sportsbook odds?
Shin’s method removes vig while accounting for the insider trading fraction z in the market. It solves for z such that the sum of adjusted probabilities equals 1.0, using the formula p_true_i = (sqrt(z^2 + 4*(1-z)pi_raw_i/sum_pi) - z) / (2(1-z)). It produces more accurate probabilities than simple multiplicative normalization, especially for markets with large favorites where sportsbooks shade the line more heavily.
How does Kelly criterion fit into an edge detection pipeline?
Kelly criterion is Step 5 in the pipeline — it sizes bets after edge is detected. The formula f* = edge / (decimal_odds - 1) determines what fraction of bankroll to wager. Production agents use quarter-Kelly (f*/4) to reduce variance. Kelly requires positive edge as input, so it only activates after the edge filter confirms the signal exceeds the minimum threshold.
What is closing line value and why does it matter for betting agents?
Closing line value (CLV) measures whether your agent consistently beats the final market price. If your agent buys YES at $0.52 and the market closes at $0.58, that is 6 cents of CLV. Positive CLV over hundreds of bets is the strongest evidence that your model has real predictive edge, because closing lines incorporate all available information. The CLV guide covers the math in full.
How do you detect edge across multiple sportsbooks and prediction markets?
Pull odds from all available sources via The Odds API, Polymarket CLOB, and Kalshi REST endpoints. Convert each to implied probability, remove vig using Shin’s method for sportsbooks and midpoint estimation for prediction markets, then compare against your model’s probability estimate. Edge equals model probability minus market probability. An agent scans all books simultaneously and routes execution to the book offering the best price for each signal.
What’s Next
This pipeline is the integration layer. Each component has its own deep-dive guide:
- Model improvement: Feature Engineering for Sports Prediction covers which inputs make your Step 3 models more accurate.
- Significance testing: Statistical Significance in Sports Betting tells you how many bets you need before trusting your CLV and calibration metrics.
- Full API reference: The Prediction Market API Reference documents every endpoint for Polymarket, Kalshi, and other platforms your pipeline ingests from.
- Live vig data: The AgentBets Vig Index feeds directly into Step 2 — use it to identify which books have the lowest vig and therefore offer the best starting prices.
- Explore the full series: Browse all 40 guides in the Math Behind Betting series to strengthen any individual pipeline component.
