All Layers

The Odds API to Edge Detection: Building an End-to-End Agent Math Pipeline

Q: "How do you build an automated betting pipeline from odds to trade execution?"

"An automated betting pipeline follows eight steps: ingest odds from APIs, convert to implied probabilities and remove vig, generate model predictions, calculate edge (model probability minus market probability), filter by minimum edge threshold, size bets using Kelly criterion, check portfolio correlation limits, and execute through the appropriate trading interface. Each step has specific math that feeds into the next."

Q: "What is Shin's method for removing vig from sportsbook odds?"

"Shin's method removes vig while accounting for the insider trading fraction z in the market. It solves for z such that the sum of adjusted probabilities equals 1.0, using the formula p_true_i = (sqrt(z^2 + 4*(1-z)*pi_raw_i/n) - z) / (2*(1-z)). It produces more accurate probabilities than simple multiplicative normalization, especially for markets with large favorites."

Q: "How does Kelly criterion fit into an edge detection pipeline?"

"Kelly criterion is Step 5 in the pipeline — it sizes bets after edge is detected. The formula f* = edge / (decimal_odds - 1) determines what fraction of bankroll to wager. Production agents use quarter-Kelly (f*/4) to reduce variance. Kelly requires positive edge as input, so it only activates after the edge filter confirms the signal exceeds the minimum threshold."

Q: "What is closing line value and why does it matter for betting agents?"

"Closing line value (CLV) measures whether your agent consistently beats the final market price. If your agent buys YES at $0.52 and the market closes at $0.58, that is 6 cents of CLV. Positive CLV over hundreds of bets is the strongest evidence that your model has real predictive edge, because closing lines incorporate all available information."

Q: "How do you detect edge across multiple sportsbooks and prediction markets?"

"Pull odds from all available sources via The Odds API, Polymarket CLOB, and Kalshi REST endpoints. Convert each to implied probability, remove vig using Shin's method, then compare against your model's probability estimate. Edge equals model probability minus market probability. An agent scans all books simultaneously and routes execution to the book offering the best price for each signal."

By Rahim March 21, 2026March 31, 2026 · 22 min read

Complete eight-step mathematical pipeline from raw odds ingestion through edge detection to trade execution. Covers vig removal with Shin's method, model prediction, Kelly sizing, portfolio correlation checks, and CLV feedback loops — with full Python code.

The Odds API to Edge Detection: Building an End-to-End Agent Math Pipeline

Summary: Capstone guide for the AgentBets Math Behind Betting series. Builds a complete eight-step mathematical pipeline that transforms raw odds data into actionable positive expected value (+EV) trade signals for autonomous betting agents. Step 1: Ingest odds from The Odds API across 20+ sportsbooks and prediction markets (Polymarket CLOB, Kalshi REST API) using standardized polling. Step 2: Convert American, decimal, and fractional odds to implied probabilities using p = |odds|/(|odds|+100) for favorites and p = 100/(odds+100) for underdogs. Remove vig using Shin's method z = (1 - sqrt(1 - 4*(o/(o+1))*(pi_raw - 1/n))) / (2*(pi_raw - 1/n)) which accounts for insider trading fraction, superior to multiplicative normalization for sportsbook markets. Step 3: Generate true probability estimates using model ensemble — Elo ratings for team strength, Poisson distribution for score projection, logistic regression for binary outcomes, with inverse-variance weighting across models. Step 4: Calculate edge as edge_i = p_model_i - p_market_i for each outcome across every available book. Step 5: Filter signals by minimum edge threshold (2% for sportsbooks, 1% for prediction markets) and apply Kelly criterion f* = edge / (odds - 1) for bet sizing with quarter-Kelly fractional scaling. Step 6: Check portfolio correlation using Pearson correlation matrix across active positions, enforce maximum portfolio heat of 15% of bankroll and maximum single-position size of 5%. Step 7: Route execution through appropriate Layer 3 trading interface — py-clob-client for Polymarket, REST API for Kalshi, offshore API for sportsbooks. Step 8: Track closing line value (CLV) as primary feedback metric and run Brier score calibration on model outputs for continuous improvement. Architecture spans all four layers of the Agent Betting Stack: Layer 1 (Access) for API connections, Layer 2 (Wallet) for bankroll and position limits, Layer 3 (Trading) for order execution, Layer 4 (Intelligence) for model predictions and edge calculation. References Polyseer for Bayesian aggregation, the AgentBets Vig Index for real-time overround data, and the Arbitrage Calculator for cross-platform opportunities. Topics: betting pipeline, edge detection, odds API, vig removal, Shin method, Kelly criterion, portfolio correlation, CLV tracking, model calibration, agent architecture, Polymarket, Kalshi, sportsbook automation.

Topics: betting pipeline, edge detection, odds API, vig removal, Shin method, Kelly criterion, portfolio correlation, CLV tracking, model calibration, agent architecture, Polymarket, Kalshi, sportsbook automation

Stack layers: All Layers

Related tools: The Odds API, Polymarket CLOB, Kalshi API, py-clob-client, Polyseer, numpy, scipy, pandas

Eight steps from raw odds to executed trade: ingest, convert, model, calculate edge (p_model - p_market), filter, Kelly-size, correlation-check, execute. This guide wires together every formula in the series into one runnable pipeline. Edge = model probability minus market probability. If edge > threshold and Kelly says bet, the agent bets.

Why This Matters for Agents

This is the capstone. Every other guide in the Math Behind Betting series derives a formula, proves a theorem, or builds a model component. This guide wires them together into a production pipeline that takes raw odds as input and outputs sized, correlated, executable trade signals.

The pipeline spans all four layers of the Agent Betting Stack. Layer 1 (Access) handles API connections to The Odds API, Polymarket, and Kalshi. Layer 2 (Wallet) enforces bankroll limits, position sizing, and drawdown guards. Layer 3 (Trading) converts signals into orders routed to the correct book. Layer 4 (Intelligence) runs model predictions, edge calculations, and the CLV feedback loop that tells the agent whether its models are actually working. No single layer operates in isolation — the pipeline is the integration layer that connects them.

An agent without a pipeline is a collection of disconnected functions. An agent with a pipeline is a system that turns data into money.

The Math

Pipeline Architecture

Step 1: INGEST          Step 2: CONVERT           Step 3: MODEL
┌─────────────┐      ┌──────────────────┐      ┌──────────────────┐
│ The Odds API │─────▶│ American → Prob  │─────▶│ Elo + Poisson +  │
│ Polymarket   │      │ Shin vig removal │      │ Regression       │
│ Kalshi       │      │ Fee adjustment   │      │ Ensemble weight  │
└─────────────┘      └──────────────────┘      └──────────────────┘
                                                        │
Step 8: FEEDBACK       Step 7: EXECUTE           Step 4: EDGE
┌──────────────────┐  ┌──────────────────┐      ┌──────────────────┐
│ CLV tracking     │◀─│ Route to best    │◀─────│ edge_i =         │
│ Brier calibrate  │  │ book per signal  │      │ p_model - p_mkt  │
│ Model retrain    │  │ Layer 3 execute  │      │ per outcome      │
└──────────────────┘  └──────────────────┘      └──────────────────┘
                              ▲                         │
                      Step 6: PORTFOLIO          Step 5: FILTER+SIZE
                      ┌──────────────────┐      ┌──────────────────┐
                      │ Correlation chk  │◀─────│ edge > threshold │
                      │ Max heat 15%     │      │ Kelly f* sizing  │
                      │ Max position 5%  │      │ Quarter-Kelly    │
                      └──────────────────┘      └──────────────────┘

Step 1: Odds Ingestion

The Odds API aggregates lines from 20+ sportsbooks in a single call. Polymarket and Kalshi require separate integrations. An agent polls all three on a cadence — every 30-60 seconds for live markets, every 5-15 minutes for pre-game.

The raw output is a mess: American odds from sportsbooks (-110, +250), decimal prices from Polymarket ($0.63), cent prices from Kalshi (63). Step 2 normalizes everything.

Step 2: Probability Conversion and Vig Removal

Three conversion formulas, depending on source:

American odds to implied probability:

If odds < 0 (favorite):  p = |odds| / (|odds| + 100)
If odds > 0 (underdog):  p = 100 / (odds + 100)

Example: -150 → 150/250 = 0.600. +130 → 100/230 = 0.435. Sum = 1.035, so the overround is 3.5%.

Decimal odds to implied probability:

p = 1 / decimal_odds

Prediction market prices are already probabilities (after fee adjustment). See Prediction Market Math 101 for the derivation.

Vig Removal: Shin’s Method

Raw implied probabilities from sportsbooks sum to more than 1.0. The excess is the vig. Naive multiplicative removal (divide each by the sum) distributes vig uniformly across outcomes. That assumption is wrong — sportsbooks shade favorites more than longshots.

Shin’s method models vig as a function of an insider trading fraction z. The true probability for outcome i is:

p_true_i = (sqrt(z^2 + 4*(1-z) * pi_raw_i * (1/n)) - z) / (2 * (1-z))

Wait — let me state the precise formulation. Given n outcomes with raw implied probabilities pi_raw_i (which sum to 1 + overround), Shin’s method solves for the insider fraction z such that:

Sum_i [ (sqrt(z^2 + 4*(1-z) * pi_raw_i / sum_pi) - z) / (2*(1-z)) ] = 1.0

where sum_pi = Sum_i(pi_raw_i). This is solved numerically. The key advantage: Shin’s method compresses longshot probabilities less than favorite probabilities, matching observed sportsbook behavior. The Sports Betting Math 101 guide covers the derivation in detail. The AgentBets Vig Index publishes live overround data per sportsbook.

Step 3: Model Prediction

The agent needs its own probability estimate for each outcome. The series covers three core model families:

Elo ratings — team/player strength from head-to-head results. Convert Elo difference to win probability via the logistic function: p = 1 / (1 + 10^(-delta/400)). See Elo Ratings and Power Rankings.
Poisson models — project score totals from average goals/points per game, then derive outcome probabilities from the joint Poisson distribution. See Poisson Distribution and Sports Modeling.
Logistic regression — binary outcome classification using feature vectors (home advantage, rest days, injuries, weather). See Regression Models for Sports Betting.

For prediction markets, Bayesian updating from news events and poll data replaces sports-specific models. Polyseer implements multi-agent Bayesian aggregation for this purpose.

Ensemble Weighting

A single model is fragile. Combining models with inverse-variance weighting produces more calibrated estimates:

p_ensemble = Sum_i(w_i * p_i) / Sum_i(w_i)

where w_i = 1 / var_i

var_i is the historical variance of model i’s probability estimates against realized outcomes. A model with lower variance (better calibrated) gets higher weight. See Calibration and Model Evaluation for measuring model variance with Brier decomposition.

Step 4: Edge Calculation

Edge is the gap between what the agent believes and what the market believes:

edge_i = p_model_i - p_market_i

For a bet on outcome i at decimal odds d_i, the expected value is:

EV_i = p_model_i * d_i - 1

EV > 0 is equivalent to edge > 0 when the odds are fair. See Expected Value for Prediction Market Agents for the full EV framework.

The agent computes edge for every outcome on every book simultaneously. A single NFL game with 8 sportsbooks and 3 outcomes (home, away, draw — wait, NFL doesn’t have draws; home spread, away spread) produces 16+ edge calculations. Across a full Sunday slate, that is hundreds of signals.

Step 5: Edge Filter and Kelly Sizing

Not every positive-edge signal is tradeable. The agent applies two filters:

Minimum edge threshold: Sportsbook bets require edge > 2% to cover execution friction (line movement during placement, potential limit reduction). Prediction market bets require edge > 1% due to lower friction.

Kelly criterion for position sizing:

f* = edge / (decimal_odds - 1)

where edge = p_model - p_market. Production agents use quarter-Kelly:

f_actual = f* / 4

Quarter-Kelly reduces bankroll growth by roughly 44% compared to full Kelly but cuts maximum drawdown by more than half. The Kelly Criterion guide proves this tradeoff. The Drawdown Math guide quantifies the variance reduction.

Step 6: Portfolio Correlation Check

Individual Kelly-sized bets can aggregate into dangerous portfolio-level risk if outcomes are correlated. An agent betting on Lakers -3.5 and Over 220.5 in the same game has correlated positions — if the Lakers win big, both bets are more likely to hit.

The agent must check:

Pairwise correlation between active positions using historical outcome co-occurrence. See Correlation and Portfolio Theory for Multi-Market Agents.
Maximum portfolio heat — total bankroll at risk across all open positions. Cap at 15% of bankroll.
Maximum single-position size — cap at 5% of bankroll regardless of Kelly output.
Same-game parlay risk — correlated legs compound risk non-linearly. See Correlation Risk in Parlays.

If adding a new position would breach any limit, the agent either reduces the position size or skips the signal entirely.

Step 7: Execution Routing

The agent routes each signal to the book offering the best price:

Polymarket: Execute via py-clob-client limit orders on the Polygon CLOB
Kalshi: Execute via Kalshi REST API market/limit orders
Sportsbooks: Execute via offshore sportsbook APIs where available (see offshore sportsbook API hub) or manual placement for books without API access

Best execution means: place the order on the book where p_market is lowest (highest edge) for the target outcome. If BetOnline has Lakers -3.5 at -108 and Bovada has it at -112, route to BetOnline. The Arbitrage Detection guide covers cross-book price comparison in depth.

Step 8: CLV Feedback Loop

After execution, the agent tracks two metrics:

Closing Line Value (CLV): Did the agent beat the closing price? If it bought YES at $0.52 and the market closed at $0.58, that is +6 cents of CLV. Consistent positive CLV is the strongest signal of real edge. See Closing Line Value.

Model calibration: Run Brier score decomposition on the model’s probability outputs versus realized outcomes. A well-calibrated model’s 70% predictions should resolve YES roughly 70% of the time. See Calibration and Model Evaluation.

The feedback loop closes when calibration metrics trigger model retraining. Feature Engineering for Sports Prediction covers which features to add or drop based on predictive signal strength.

Worked Examples

Example 1: NFL Moneyline Edge Detection

The agent pulls NFL Week 12 moneyline odds from The Odds API for Eagles vs. Cowboys:

BetOnline:     Eagles -185, Cowboys +155
Bovada:        Eagles -180, Cowboys +150
BookMaker:     Eagles -175, Cowboys +155
Pinnacle:      Eagles -178, Cowboys +158

Step 2 — Convert and remove vig (Shin’s method on Pinnacle as benchmark):

Raw implied: Eagles = 178/278 = 0.6403, Cowboys = 100/258 = 0.3876. Sum = 1.0279, overround = 2.79%.

After Shin’s vig removal (solved numerically): Eagles true probability = 0.6258, Cowboys = 0.3742.

Step 3 — Model prediction:

The agent’s Elo model: Eagles 0.64, Cowboys 0.36. Poisson model: Eagles 0.61, Cowboys 0.39. Logistic regression: Eagles 0.63, Cowboys 0.37.

Inverse-variance ensemble (Elo var=0.04, Poisson var=0.06, Logistic var=0.05):

w_elo = 25, w_poisson = 16.67, w_logistic = 20
p_eagles = (250.64 + 16.670.61 + 20*0.63) / 61.67 = 0.6283

Step 4 — Edge:

Best available line: BookMaker Eagles at -175 → implied 0.6364, Shin-adjusted 0.6210.

edge = 0.6283 - 0.6210 = 0.0073 (0.73%)

Step 5 — Filter: Edge 0.73% < 2% threshold. Signal rejected. The agent does not bet this game.

Example 2: Polymarket Event with Detectable Edge

Market: “Will the Fed cut rates at June 2026 FOMC?” YES trading at $0.41 bid / $0.43 ask on Polymarket.

Step 2: Midpoint = $0.42. Fee-adjusted = 0.42 / (1 - 0.02 * 0.58) = 0.42 / 0.9884 = 0.4249.

Step 3: Agent’s Bayesian model, updated after this morning’s CPI print (core CPI at 2.1%, below consensus 2.4%), estimates P(cut) = 0.54.

Step 4: edge = 0.54 - 0.4249 = 0.1151 (11.5%).

Step 5: Edge 11.5% » 1% threshold. Pass.

Decimal odds for YES at $0.42 = 1/0.42 = 2.381. Kelly: f* = 0.1151 / (2.381 - 1) = 0.0833. Quarter-Kelly: f = 0.0208 (2.08% of bankroll).

With $10,000 bankroll: bet size = $208 on YES at $0.42.

Step 6: Check correlation with existing positions. Agent already holds $150 on “Core PCE below 2.5% in Q2 2026” — these outcomes are correlated (both benefit from cooling inflation). Correlation coefficient estimated at 0.65. Reduce new position by correlation factor: $208 * (1 - 0.65*150/10000) = $206. Minimal adjustment because existing position is small relative to bankroll.

Step 7: Execute limit buy of 495 YES contracts at $0.42 on Polymarket via py-clob-client.

Step 8: Two weeks later, additional dovish data lands. Market moves to YES $0.58. CLV = +$0.16. The model was right — and the closing line confirms it.

Example 3: Cross-Platform Arbitrage

Agent detects a pricing gap between Kalshi and a sportsbook for “Total points in Super Bowl over/under 49.5”:

Kalshi:   Over 49.5 YES at 54¢  (implied 0.54)
BetOnline: Under 49.5 at -105   (implied 0.5122)

Combined cost: 0.54 + 0.5122 = 1.0522. No arb — the sum exceeds 1.0.

But wait — check BetOnline’s Over 49.5 at +100 (implied 0.50) against Kalshi’s Under YES at 44¢ (implied 0.44):

Combined cost: 0.50 + 0.44 = 0.94. Arb exists. Guaranteed profit = 1.0 - 0.94 = $0.06 per dollar.

Route: buy Over at +100 on BetOnline, buy Under YES at 44¢ on Kalshi. The Arbitrage Calculator computes optimal capital allocation between legs.

Implementation

"""
End-to-end edge detection pipeline.
Requires: pip install numpy scipy pandas requests
"""

import numpy as np
from scipy.optimize import brentq
from dataclasses import dataclass, field
from typing import Optional
import pandas as pd


# ─────────────────────────────────────────────
# Step 1 & 2: Odds Ingestion and Conversion
# ─────────────────────────────────────────────

def american_to_implied(odds: int) -> float:
    """Convert American odds to raw implied probability."""
    if odds < 0:
        return abs(odds) / (abs(odds) + 100)
    else:
        return 100 / (odds + 100)


def decimal_to_implied(odds: float) -> float:
    """Convert decimal odds to raw implied probability."""
    return 1.0 / odds


def implied_to_decimal(prob: float) -> float:
    """Convert implied probability to decimal odds."""
    if prob <= 0:
        return float('inf')
    return 1.0 / prob


def shin_remove_vig(raw_probs: np.ndarray) -> np.ndarray:
    """
    Remove vig using Shin's method.

    Solves for insider fraction z such that adjusted probabilities
    sum to 1.0. More accurate than multiplicative normalization
    for sportsbook markets with favorite-longshot bias.

    Parameters:
        raw_probs: array of raw implied probabilities (sum > 1.0)

    Returns:
        array of true probabilities (sum = 1.0)
    """
    n = len(raw_probs)
    total = np.sum(raw_probs)

    if np.isclose(total, 1.0, atol=0.001):
        return raw_probs

    normalized = raw_probs / total

    def shin_equation(z: float) -> float:
        adjusted = np.array([
            (np.sqrt(z**2 + 4 * (1 - z) * p * (1.0 / total)) - z)
            / (2 * (1 - z))
            for p in raw_probs
        ])
        return np.sum(adjusted) - 1.0

    # z is the insider trading fraction, typically 0 < z < 0.1
    try:
        z_star = brentq(shin_equation, 1e-10, 0.5)
    except ValueError:
        # Fallback to multiplicative if Shin's doesn't converge
        return raw_probs / total

    true_probs = np.array([
        (np.sqrt(z_star**2 + 4 * (1 - z_star) * p * (1.0 / total)) - z_star)
        / (2 * (1 - z_star))
        for p in raw_probs
    ])

    return true_probs


def polymarket_fee_adjust(midpoint: float, fee_rate: float = 0.02) -> float:
    """Adjust Polymarket midpoint for winner fee."""
    if 0 < midpoint < 1:
        return midpoint / (1 - fee_rate * (1 - midpoint))
    return midpoint


# ─────────────────────────────────────────────
# Step 3: Model Ensemble
# ─────────────────────────────────────────────

@dataclass
class ModelOutput:
    """Output from a single prediction model."""
    name: str
    probability: float
    variance: float  # historical prediction variance


def ensemble_predict(models: list[ModelOutput]) -> float:
    """
    Inverse-variance weighted ensemble.

    Models with lower historical variance (better calibration)
    receive higher weight.

    Parameters:
        models: list of ModelOutput with probability and variance

    Returns:
        ensemble probability estimate
    """
    weights = np.array([1.0 / m.variance for m in models])
    probs = np.array([m.probability for m in models])
    return float(np.average(probs, weights=weights))


# ─────────────────────────────────────────────
# Step 4: Edge Calculation
# ─────────────────────────────────────────────

@dataclass
class EdgeSignal:
    """A detected edge opportunity."""
    event: str
    outcome: str
    book: str
    market_prob: float
    model_prob: float
    edge: float
    decimal_odds: float
    ev_per_dollar: float


def calculate_edges(
    event: str,
    outcome: str,
    book_probs: dict[str, float],
    model_prob: float,
) -> list[EdgeSignal]:
    """
    Calculate edge for one outcome across all books.

    Parameters:
        event: event description
        outcome: outcome name (e.g., "Eagles ML")
        book_probs: dict of book_name -> market implied probability
        model_prob: agent's model probability

    Returns:
        list of EdgeSignal sorted by edge descending
    """
    signals = []
    for book, mkt_prob in book_probs.items():
        edge = model_prob - mkt_prob
        dec_odds = implied_to_decimal(mkt_prob)
        ev = model_prob * dec_odds - 1.0
        signals.append(EdgeSignal(
            event=event,
            outcome=outcome,
            book=book,
            market_prob=mkt_prob,
            model_prob=model_prob,
            edge=edge,
            decimal_odds=dec_odds,
            ev_per_dollar=ev,
        ))
    signals.sort(key=lambda s: s.edge, reverse=True)
    return signals


# ─────────────────────────────────────────────
# Step 5: Filter and Kelly Sizing
# ─────────────────────────────────────────────

@dataclass
class SizedBet:
    """A filtered and sized bet ready for portfolio check."""
    signal: EdgeSignal
    kelly_fraction: float
    quarter_kelly: float
    bet_amount: float


def filter_and_size(
    signals: list[EdgeSignal],
    bankroll: float,
    min_edge_sportsbook: float = 0.02,
    min_edge_predmarket: float = 0.01,
    kelly_fraction: float = 0.25,
    max_position_pct: float = 0.05,
    is_prediction_market: bool = False,
) -> list[SizedBet]:
    """
    Filter by minimum edge, size with fractional Kelly.

    Parameters:
        signals: list of EdgeSignal (typically from one outcome, multiple books)
        bankroll: current bankroll in dollars
        min_edge_sportsbook: minimum edge for sportsbook bets
        min_edge_predmarket: minimum edge for prediction market bets
        kelly_fraction: Kelly multiplier (0.25 = quarter-Kelly)
        max_position_pct: maximum single position as fraction of bankroll
        is_prediction_market: True for Polymarket/Kalshi signals

    Returns:
        list of SizedBet passing all filters
    """
    threshold = min_edge_predmarket if is_prediction_market else min_edge_sportsbook
    sized = []

    for sig in signals:
        if sig.edge < threshold:
            continue
        if sig.decimal_odds <= 1.0:
            continue

        # Kelly: f* = edge / (odds - 1)
        full_kelly = sig.edge / (sig.decimal_odds - 1.0)
        frac_kelly = full_kelly * kelly_fraction

        # Cap at max position size
        frac_kelly = min(frac_kelly, max_position_pct)

        bet_amt = bankroll * frac_kelly

        sized.append(SizedBet(
            signal=sig,
            kelly_fraction=full_kelly,
            quarter_kelly=frac_kelly,
            bet_amount=round(bet_amt, 2),
        ))

    return sized


# ─────────────────────────────────────────────
# Step 6: Portfolio Correlation Check
# ─────────────────────────────────────────────

@dataclass
class Position:
    """An existing or proposed portfolio position."""
    event: str
    outcome: str
    amount: float
    correlation_group: str  # e.g., "NFL_week12", "Fed_June"


def portfolio_check(
    new_bet: SizedBet,
    existing_positions: list[Position],
    correlation_matrix: dict[tuple[str, str], float],
    bankroll: float,
    max_heat_pct: float = 0.15,
) -> tuple[float, bool]:
    """
    Check portfolio constraints and adjust position size.

    Parameters:
        new_bet: proposed bet
        existing_positions: current open positions
        correlation_matrix: pairwise correlation between event groups
        bankroll: current bankroll
        max_heat_pct: maximum total exposure as fraction of bankroll

    Returns:
        (adjusted_amount, is_approved)
    """
    # Current total exposure
    current_heat = sum(p.amount for p in existing_positions)
    max_heat = bankroll * max_heat_pct

    # Room for new position
    room = max_heat - current_heat
    if room <= 0:
        return 0.0, False

    # Adjust for correlation with existing positions
    adjusted_amount = new_bet.bet_amount
    new_group = new_bet.signal.event

    for pos in existing_positions:
        corr_key = (new_group, pos.correlation_group)
        reverse_key = (pos.correlation_group, new_group)
        corr = correlation_matrix.get(corr_key,
               correlation_matrix.get(reverse_key, 0.0))

        if corr > 0.3:
            # Reduce new position proportional to correlated exposure
            reduction = corr * (pos.amount / bankroll)
            adjusted_amount *= (1 - reduction)

    # Cap at remaining room
    adjusted_amount = min(adjusted_amount, room)

    return round(adjusted_amount, 2), adjusted_amount > 0


# ─────────────────────────────────────────────
# Step 8: CLV and Calibration Tracking
# ─────────────────────────────────────────────

@dataclass
class TradeRecord:
    """Completed trade for feedback analysis."""
    event: str
    outcome: str
    entry_prob: float
    closing_prob: float
    model_prob: float
    result: int  # 1 = win, 0 = loss


def compute_clv(trades: list[TradeRecord]) -> pd.DataFrame:
    """
    Compute CLV statistics across all trades.

    Returns DataFrame with per-trade CLV and summary stats.
    """
    records = []
    for t in trades:
        clv = t.closing_prob - t.entry_prob
        records.append({
            "event": t.event,
            "entry_prob": t.entry_prob,
            "closing_prob": t.closing_prob,
            "clv": clv,
            "clv_pct": clv * 100,
            "result": t.result,
        })

    df = pd.DataFrame(records)
    return df


def brier_score(predictions: np.ndarray, outcomes: np.ndarray) -> float:
    """
    Brier score: mean squared error of probability predictions.

    Lower is better. Perfect = 0.0, coin flip = 0.25.

    Parameters:
        predictions: array of predicted probabilities
        outcomes: array of binary outcomes (0 or 1)

    Returns:
        Brier score
    """
    return float(np.mean((predictions - outcomes) ** 2))


def calibration_bins(
    predictions: np.ndarray,
    outcomes: np.ndarray,
    n_bins: int = 10,
) -> pd.DataFrame:
    """
    Bin predictions and compare predicted vs actual frequency.

    A perfectly calibrated model has predicted_mean == actual_mean
    in every bin.
    """
    bin_edges = np.linspace(0, 1, n_bins + 1)
    bins = np.digitize(predictions, bin_edges) - 1
    bins = np.clip(bins, 0, n_bins - 1)

    rows = []
    for i in range(n_bins):
        mask = bins == i
        if mask.sum() == 0:
            continue
        rows.append({
            "bin_lower": bin_edges[i],
            "bin_upper": bin_edges[i + 1],
            "count": int(mask.sum()),
            "predicted_mean": float(predictions[mask].mean()),
            "actual_mean": float(outcomes[mask].mean()),
            "gap": float(predictions[mask].mean() - outcomes[mask].mean()),
        })

    return pd.DataFrame(rows)


# ─────────────────────────────────────────────
# Full Pipeline Runner
# ─────────────────────────────────────────────

def run_pipeline_example():
    """
    Demonstrate the full pipeline with sample data.
    """
    print("=" * 60)
    print("EDGE DETECTION PIPELINE — SAMPLE RUN")
    print("=" * 60)

    # Step 1: Raw odds (simulating The Odds API response)
    odds_data = {
        "event": "Eagles vs Cowboys — Moneyline",
        "books": {
            "BetOnline":  {"Eagles": -185, "Cowboys": 155},
            "Bovada":     {"Eagles": -180, "Cowboys": 150},
            "BookMaker":  {"Eagles": -175, "Cowboys": 155},
            "Pinnacle":   {"Eagles": -178, "Cowboys": 158},
        }
    }

    # Step 2: Convert and remove vig
    print("\n--- Step 2: Vig Removal (Shin's Method) ---")
    all_book_probs = {}
    for book, lines in odds_data["books"].items():
        raw = np.array([american_to_implied(lines["Eagles"]),
                        american_to_implied(lines["Cowboys"])])
        true = shin_remove_vig(raw)
        all_book_probs[book] = {
            "Eagles": float(true[0]),
            "Cowboys": float(true[1]),
        }
        print(f"  {book:12s}: raw [{raw[0]:.4f}, {raw[1]:.4f}] "
              f"sum={raw.sum():.4f} → shin [{true[0]:.4f}, {true[1]:.4f}]")

    # Step 3: Model ensemble
    print("\n--- Step 3: Model Ensemble ---")
    models = [
        ModelOutput("Elo", 0.64, 0.040),
        ModelOutput("Poisson", 0.61, 0.060),
        ModelOutput("Logistic", 0.63, 0.050),
    ]
    p_model = ensemble_predict(models)
    print(f"  Elo: {models[0].probability:.2f} (var={models[0].variance})")
    print(f"  Poisson: {models[1].probability:.2f} (var={models[1].variance})")
    print(f"  Logistic: {models[2].probability:.2f} (var={models[2].variance})")
    print(f"  Ensemble: {p_model:.4f}")

    # Step 4: Edge calculation
    print("\n--- Step 4: Edge Calculation ---")
    eagles_probs = {b: p["Eagles"] for b, p in all_book_probs.items()}
    edges = calculate_edges(
        odds_data["event"], "Eagles ML", eagles_probs, p_model
    )
    for e in edges:
        print(f"  {e.book:12s}: mkt={e.market_prob:.4f} "
              f"edge={e.edge:+.4f} EV=${e.ev_per_dollar:+.4f}/dollar")

    # Step 5: Filter and size
    print("\n--- Step 5: Filter & Kelly Sizing ---")
    bankroll = 10000
    sized = filter_and_size(edges, bankroll, is_prediction_market=False)
    if sized:
        for s in sized:
            print(f"  BET: {s.signal.book} — ${s.bet_amount:.2f} "
                  f"(Kelly={s.kelly_fraction:.4f}, "
                  f"1/4 Kelly={s.quarter_kelly:.4f})")
    else:
        print("  No bets pass the 2% edge threshold. Pipeline exits.")

    # Now demonstrate with a prediction market signal
    print("\n" + "=" * 60)
    print("PREDICTION MARKET SIGNAL")
    print("=" * 60)

    pm_mid = 0.42
    pm_adjusted = polymarket_fee_adjust(pm_mid)
    pm_model = 0.54
    pm_edge = pm_model - pm_adjusted

    print(f"\n  Market: Fed June 2026 Rate Cut — YES at ${pm_mid:.2f}")
    print(f"  Fee-adjusted: {pm_adjusted:.4f}")
    print(f"  Model prob:   {pm_model:.2f}")
    print(f"  Edge:         {pm_edge:.4f} ({pm_edge*100:.1f}%)")

    pm_signals = [EdgeSignal(
        event="Fed Rate Cut June 2026",
        outcome="YES",
        book="Polymarket",
        market_prob=pm_adjusted,
        model_prob=pm_model,
        edge=pm_edge,
        decimal_odds=implied_to_decimal(pm_mid),
        ev_per_dollar=pm_model * implied_to_decimal(pm_mid) - 1.0,
    )]

    pm_sized = filter_and_size(
        pm_signals, bankroll, is_prediction_market=True
    )
    for s in pm_sized:
        print(f"\n  BET: {s.signal.book} — ${s.bet_amount:.2f} "
              f"(full Kelly={s.kelly_fraction:.4f}, "
              f"quarter Kelly={s.quarter_kelly:.4f})")

    # Step 8: Simulated CLV tracking
    print("\n" + "=" * 60)
    print("FEEDBACK LOOP — CLV ANALYSIS")
    print("=" * 60)

    sample_trades = [
        TradeRecord("Fed Cut Jun", "YES", 0.42, 0.58, 0.54, 1),
        TradeRecord("Eagles ML", "Eagles", 0.62, 0.64, 0.63, 1),
        TradeRecord("Lakers +3.5", "Lakers", 0.48, 0.45, 0.50, 0),
        TradeRecord("BTC > 100k Dec", "YES", 0.35, 0.41, 0.40, 1),
        TradeRecord("Ohtani HR", "YES", 0.28, 0.31, 0.33, 0),
    ]

    clv_df = compute_clv(sample_trades)
    print(f"\n  Average CLV: {clv_df['clv'].mean():+.4f} "
          f"({clv_df['clv_pct'].mean():+.2f}%)")
    print(f"  CLV > 0 rate: {(clv_df['clv'] > 0).mean():.0%}")
    print(f"  Win rate: {clv_df['result'].mean():.0%}")

    preds = np.array([t.model_prob for t in sample_trades])
    outs = np.array([t.result for t in sample_trades])
    bs = brier_score(preds, outs)
    print(f"  Brier score: {bs:.4f} (lower is better, 0.25 = coin flip)")


if __name__ == "__main__":
    run_pipeline_example()

Limitations and Edge Cases

Model risk is the dominant failure mode. The pipeline is only as good as Step 3. If the model’s probability estimates are miscalibrated — consistently overestimating or underestimating — the edge calculation produces false signals. Garbage in, garbage out. The Brier score feedback loop (Step 8) catches this, but only after enough trades to reach statistical significance. See Statistical Significance in Sports Betting for minimum sample sizes — you need 1,000+ bets to distinguish skill from luck at typical edge levels.

Shin’s method assumes a specific market structure. It models the bookmaker’s overround as arising from informed trading. For prediction markets with order-book-based pricing (Polymarket CLOB), Shin’s model doesn’t apply — the vig arises from the bid-ask spread, not from a bookmaker’s margin. Use midpoint estimation for prediction markets and Shin’s method for sportsbooks.

Correlation estimation is hard. Step 6 requires a correlation matrix between outcomes. For same-sport events (two NFL games on the same Sunday), correlation is near zero. For same-game outcomes (moneyline and total), correlation can exceed 0.5. For macro events (Fed rate decisions affecting multiple markets), correlations shift rapidly. Historical correlation is a lagging indicator — regime changes break the matrix.

Execution slippage destroys thin edges. A 2% edge on a sportsbook line evaporates if the line moves 2 points between signal detection and order placement. Latency matters. An agent running this pipeline on a 60-second polling interval will miss edges that a 1-second polling agent captures. For betting bots in production, sub-second execution is table stakes.

API rate limits and account limits. The Odds API free tier allows 500 requests/month. Sportsbooks limit sharp accounts. Polymarket has no account limits but has gas costs on Polygon. An agent must budget API calls and diversify across books to avoid being limited at any single venue. BookMaker is known for higher limits on sharp action; MyBookie limits aggressively.

Overfitting the ensemble. Inverse-variance weighting works when model errors are uncorrelated. If all three models use the same underlying features (recent form, home advantage), their errors correlate and the ensemble provides less diversification than the weights suggest. See Monte Carlo Simulation for Prediction Markets for stress-testing ensemble robustness.

FAQ

How do you build an automated betting pipeline from odds to trade execution?

An automated betting pipeline follows eight steps: ingest odds from APIs, convert to implied probabilities and remove vig, generate model predictions, calculate edge (model probability minus market probability), filter by minimum edge threshold, size bets using Kelly criterion, check portfolio correlation limits, and execute through the appropriate trading interface. Each step has specific math that feeds into the next. The full pipeline spans all four layers of the Agent Betting Stack.

What is Shin’s method for removing vig from sportsbook odds?

Shin’s method removes vig while accounting for the insider trading fraction z in the market. It solves for z such that the sum of adjusted probabilities equals 1.0, using the formula p_true_i = (sqrt(z^2 + 4*(1-z)pi_raw_i/sum_pi) - z) / (2(1-z)). It produces more accurate probabilities than simple multiplicative normalization, especially for markets with large favorites where sportsbooks shade the line more heavily.

How does Kelly criterion fit into an edge detection pipeline?

Kelly criterion is Step 5 in the pipeline — it sizes bets after edge is detected. The formula f* = edge / (decimal_odds - 1) determines what fraction of bankroll to wager. Production agents use quarter-Kelly (f*/4) to reduce variance. Kelly requires positive edge as input, so it only activates after the edge filter confirms the signal exceeds the minimum threshold.

What is closing line value and why does it matter for betting agents?

Closing line value (CLV) measures whether your agent consistently beats the final market price. If your agent buys YES at $0.52 and the market closes at $0.58, that is 6 cents of CLV. Positive CLV over hundreds of bets is the strongest evidence that your model has real predictive edge, because closing lines incorporate all available information. The CLV guide covers the math in full.

How do you detect edge across multiple sportsbooks and prediction markets?

Pull odds from all available sources via The Odds API, Polymarket CLOB, and Kalshi REST endpoints. Convert each to implied probability, remove vig using Shin’s method for sportsbooks and midpoint estimation for prediction markets, then compare against your model’s probability estimate. Edge equals model probability minus market probability. An agent scans all books simultaneously and routes execution to the book offering the best price for each signal.

What’s Next

This pipeline is the integration layer. Each component has its own deep-dive guide:

Model improvement: Feature Engineering for Sports Prediction covers which inputs make your Step 3 models more accurate.
Significance testing: Statistical Significance in Sports Betting tells you how many bets you need before trusting your CLV and calibration metrics.
Full API reference: The Prediction Market API Reference documents every endpoint for Polymarket, Kalshi, and other platforms your pipeline ingests from.
Live vig data: The AgentBets Vig Index feeds directly into Step 2 — use it to identify which books have the lowest vig and therefore offer the best starting prices.
Explore the full series: Browse all 40 guides in the Math Behind Betting series to strengthen any individual pipeline component.

Frequently Asked Questions

How do you build an automated betting pipeline from odds to trade execution?

What is Shin's method for removing vig from sportsbook odds?

Shin's method removes vig while accounting for the insider trading fraction z in the market. It solves for z such that the sum of adjusted probabilities equals 1.0, using the formula p_true_i = (sqrt(z^2 + 4*(1-z)*pi_raw_i/n) - z) / (2*(1-z)). It produces more accurate probabilities than simple multiplicative normalization, especially for markets with large favorites.

How does Kelly criterion fit into an edge detection pipeline?

What is closing line value and why does it matter for betting agents?

How do you detect edge across multiple sportsbooks and prediction markets?

Pull odds from all available sources via The Odds API, Polymarket CLOB, and Kalshi REST endpoints. Convert each to implied probability, remove vig using Shin's method, then compare against your model's probability estimate. Edge equals model probability minus market probability. An agent scans all books simultaneously and routes execution to the book offering the best price for each signal.