All Layers

Python Libraries for Quantitative Betting: The Agent Developer's Toolkit

Q: "What Python libraries do I need for a sports betting model?"

"The core stack is NumPy for vectorized math, pandas for data manipulation, scikit-learn or XGBoost for model training, and scipy.stats for probability distributions. For data, use nfl_data_py (NFL), nba_api (NBA), or pybaseball (MLB). For odds, use python-the-odds-api to pull lines from 40+ sportsbooks."

Q: "How do I connect to Polymarket with Python?"

"Use the py-clob-client library: pip install py-clob-client. It connects to Polymarket's CLOB on Polygon, providing orderbook access, order placement, and trade history. Initialize with ClobClient(host='https://clob.polymarket.com', chain_id=137) for read-only access, or add your API key and private key for trading."

Q: "What is the best machine learning algorithm for sports betting predictions?"

"Gradient-boosted trees (XGBoost, LightGBM) dominate sports prediction competitions and production betting models. They handle mixed feature types, capture nonlinear interactions, and resist overfitting with proper regularization. Use scikit-learn's TimeSeriesSplit for validation — never random cross-validation on temporal data."

Q: "How do I calculate Kelly Criterion bet sizing in Python?"

"For single bets, use the closed-form formula: f_star = (b * p - q) / b, where b = decimal_odds - 1, p = win probability, q = 1 - p. For simultaneous multi-outcome Kelly, use scipy.optimize.minimize with the negative log-growth objective. See the Kelly Criterion guide for the full derivation and implementation."

Q: "What Python libraries support Bayesian sports modeling?"

"PyMC is the primary Bayesian modeling library — it supports hierarchical models with NUTS (No-U-Turn Sampler) for efficient posterior sampling. ArviZ handles posterior diagnostics, trace plots, and model comparison via WAIC and LOO-CV. These are essential for building Bayesian Elo systems and hierarchical team strength models."

By Rahim March 21, 2026March 31, 2026 · 22 min read

Curated guide to every Python library an autonomous betting agent needs — from NumPy and SciPy to py-clob-client and nfl_data_py. Installation, usage examples, and Agent Betting Stack layer mapping for each.

Python Libraries for Quantitative Betting: The Agent Developer's Toolkit

Summary: Comprehensive reference to the Python library ecosystem for building autonomous betting agents and quantitative sports betting models. Covers the core scientific stack: NumPy for vectorized probability calculations and array operations, SciPy for probability distributions (scipy.stats.poisson for Poisson goal models, scipy.optimize.minimize for multi-outcome Kelly Criterion), pandas for odds data manipulation and time-series line movement tracking, and statsmodels for OLS regression and ARIMA line movement models. Machine learning libraries: scikit-learn for logistic regression classifiers, cross-validation with TimeSeriesSplit, and calibration curves via CalibratedClassifierCV; XGBoost and LightGBM for gradient-boosted models that dominate sports prediction competitions on Kaggle. Visualization: matplotlib and seaborn for model diagnostics, plotly for interactive probability dashboards. Sports-specific data libraries: nfl_data_py for NFL play-by-play and next-gen stats, nba_api for NBA shot charts and box scores, pybaseball for MLB Statcast pitch-level data, soccerdata for European football xG and match results. Prediction market APIs: py-clob-client for Polymarket CLOB orderbook access and order placement on Polygon, kalshi_python_sync for Kalshi REST API market data and trading. Odds aggregation: python-the-odds-api for pulling real-time lines from 40+ sportsbooks via The Odds API. Bayesian inference: PyMC for hierarchical Bayesian models with NUTS sampling, ArviZ for posterior diagnostics and model comparison via WAIC/LOO. Portfolio optimization: cvxpy for convex portfolio optimization with position constraints, scipy.optimize for simultaneous Kelly sizing. Simulation: numpy.random for Monte Carlo bankroll simulations with 10,000+ paths. Maps each library to the Agent Betting Stack layers: Layer 1 (Data) for data ingestion libraries, Layer 2 (Wallet) for bankroll management tools, Layer 3 (Trading) for API client libraries, Layer 4 (Intelligence) for ML and statistical modeling. Part of the AgentBets Math Behind Betting series. Topics: Python libraries, quantitative betting, sports data APIs, machine learning sports prediction, Bayesian inference betting, portfolio optimization, Monte Carlo simulation, prediction market APIs, Agent Betting Stack.

Topics: python libraries, quantitative betting, sports data APIs, machine learning, Bayesian inference, portfolio optimization, Monte Carlo simulation, prediction market APIs, Agent Betting Stack, scikit-learn, XGBoost, py-clob-client

Stack layers: All Layers

Related tools: NumPy, SciPy, pandas, scikit-learn, XGBoost, py-clob-client, Kalshi API, The Odds API

This is the complete Python toolkit for building autonomous betting agents. 30+ libraries organized by function — from numpy for vectorized probability math to py-clob-client for Polymarket order execution. Every library maps to a specific Agent Betting Stack layer. Install what you need, skip what you don’t.

Why This Matters for Agents

An autonomous betting agent is a software system. Software systems run on libraries. Choosing the wrong library — or reinventing functionality that already exists in a battle-tested package — is the fastest way to waste engineering time and introduce bugs into a system that handles real money.

This guide maps the entire Python ecosystem for quantitative betting to the Agent Betting Stack. Every library listed here fills a specific role across the four layers: Layer 1 (Data) ingestion and cleaning, Layer 2 (Wallet) bankroll and risk management, Layer 3 (Trading) API connectivity and order execution, and Layer 4 (Intelligence) modeling and decision logic. The goal is to give an agent developer the exact pip install commands and 5-line usage snippets to get each library running in the context of a betting system — no guessing, no Stack Overflow rabbit holes.

Every math concept in this series has a corresponding Python implementation. The probability distributions cheat sheet uses scipy.stats. The Kelly Criterion uses scipy.optimize. The Poisson model uses numpy. This page is the index that ties them all together.

The Math

Library Selection Criteria

Three rules govern library selection for a production betting agent:

Correctness over speed. A numerically unstable Kelly calculation that runs in 0.1ms is worse than a correct one that takes 10ms. Use scipy.optimize instead of hand-rolled gradient descent.
Maintained packages only. A library with no commits in 18 months is a liability. Every library here had active maintenance as of early 2026.
Minimal dependency chains. Each new dependency is a potential breakage point. Prefer the scientific Python core (NumPy, SciPy, pandas, scikit-learn) over niche wrappers when functionality overlaps.

Agent Betting Stack Layer Map

Layer 4 — Intelligence     scikit-learn, XGBoost, LightGBM, PyMC, statsmodels
                           scipy.stats, scipy.optimize, cvxpy

Layer 3 — Trading          py-clob-client, kalshi_python_sync, python-the-odds-api
                           requests, websockets, aiohttp

Layer 2 — Wallet           numpy (bankroll simulation), scipy.optimize (Kelly)
                           pandas (P&L tracking), cvxpy (position limits)

Layer 1 — Data             nfl_data_py, nba_api, pybaseball, soccerdata
                           pandas, numpy, polars

Each library appears at the layer where its primary function operates. Some span multiple layers — scipy.optimize handles both Kelly sizing (Layer 2) and model parameter fitting (Layer 4).

Core Scientific Stack

NumPy — Vectorized Numerical Operations

NumPy is the foundation. Every other scientific Python library depends on it. For betting agents, NumPy provides vectorized operations that make probability calculations fast enough for real-time decision making.

pip install numpy

import numpy as np

# Vectorized EV calculation across 1,000 simultaneous markets
prices = np.array([0.63, 0.41, 0.78, 0.22, 0.55])  # market YES prices
model_probs = np.array([0.70, 0.38, 0.82, 0.30, 0.50])  # agent's estimates

ev = model_probs * (1.0 - prices) - (1 - model_probs) * prices
print(f"EVs: {ev}")
# Positive EV = edge exists
edges = prices[ev > 0]
print(f"Markets with edge: {len(edges)} of {len(prices)}")

NumPy’s numpy.random module powers every Monte Carlo simulation in the series. The Generator API (introduced in NumPy 1.17) provides better statistical properties than the legacy RandomState:

import numpy as np

rng = np.random.default_rng(seed=42)

# Simulate 10,000 bankroll paths over 500 bets
# Each bet: 55% win probability, +100 odds (even money), 5% Kelly stake
n_paths, n_bets = 10_000, 500
win_prob, stake_frac = 0.55, 0.05

bankroll = np.ones((n_paths, n_bets + 1)) * 1000  # start $1,000
outcomes = rng.binomial(1, win_prob, size=(n_paths, n_bets))

for t in range(n_bets):
    bet_size = bankroll[:, t] * stake_frac
    bankroll[:, t + 1] = bankroll[:, t] + np.where(
        outcomes[:, t] == 1, bet_size, -bet_size
    )

median_final = np.median(bankroll[:, -1])
pct_5 = np.percentile(bankroll[:, -1], 5)
print(f"Median final bankroll: ${median_final:,.0f}")
print(f"5th percentile (worst 5%): ${pct_5:,.0f}")

SciPy — Distributions, Optimization, Statistical Tests

SciPy extends NumPy with the specific mathematical functions betting agents need: probability distributions for modeling, optimization for Kelly sizing, and statistical tests for significance testing.

pip install scipy

Probability distributions — used in Poisson modeling, xG models, and the distributions cheat sheet:

from scipy import stats

# Poisson model: predict soccer match goal probabilities
home_xg, away_xg = 1.65, 1.12

home_goals = stats.poisson(mu=home_xg)
away_goals = stats.poisson(mu=away_xg)

# P(home scores exactly 2)
print(f"P(Home=2): {home_goals.pmf(2):.3f}")

# P(total goals over 2.5)
p_over_2_5 = 0.0
for h in range(10):
    for a in range(10):
        if h + a > 2:
            p_over_2_5 += home_goals.pmf(h) * away_goals.pmf(a)
print(f"P(Over 2.5): {p_over_2_5:.3f}")

Optimization — the workhorse for multi-outcome Kelly and portfolio optimization:

from scipy.optimize import minimize
import numpy as np

def neg_log_growth(fractions, probs, odds):
    """Negative log growth rate for Kelly optimization.

    Args:
        fractions: bet fraction per outcome (to optimize)
        probs: true probabilities per outcome
        odds: decimal odds per outcome
    """
    growth = 0.0
    for i in range(len(probs)):
        win_return = 1 + fractions[i] * (odds[i] - 1)
        lose_return = 1 - fractions[i]
        growth += probs[i] * np.log(max(win_return, 1e-10))
        growth += (1 - probs[i]) * np.log(max(lose_return, 1e-10))
    return -growth

# Three simultaneous bets
probs = np.array([0.55, 0.62, 0.48])
odds = np.array([2.10, 1.75, 2.30])  # decimal odds

result = minimize(
    neg_log_growth, x0=[0.05, 0.05, 0.05],
    args=(probs, odds),
    bounds=[(0, 0.25)] * 3,  # max 25% per bet
    method="SLSQP"
)
print(f"Optimal Kelly fractions: {result.x}")
print(f"Expected log growth: {-result.fun:.6f}")

pandas — Data Manipulation and Analysis

Every betting dataset — odds histories, play-by-play data, P&L logs — lives in a pandas DataFrame. pandas is Layer 1 infrastructure for data cleaning and Layer 2 infrastructure for bankroll tracking.

pip install pandas

import pandas as pd
import numpy as np

# Track betting P&L with proper bankroll accounting
bets = pd.DataFrame({
    "date": pd.date_range("2026-01-01", periods=8, freq="D"),
    "market": ["LAL -3.5", "BOS ML", "Over 221.5", "GSW +7",
               "NYK -1.5", "MIL ML", "Under 218", "PHX +4"],
    "odds": [-110, -135, -110, +150, -105, -120, -110, +130],
    "stake": [100, 150, 100, 80, 120, 100, 100, 90],
    "result": ["W", "L", "W", "W", "L", "W", "L", "W"]
})

def american_to_payout(odds, stake):
    if odds < 0:
        return stake * (100 / abs(odds))
    return stake * (odds / 100)

bets["payout"] = bets.apply(
    lambda r: american_to_payout(r["odds"], r["stake"]) if r["result"] == "W" else -r["stake"],
    axis=1
)
bets["cumulative_pnl"] = bets["payout"].cumsum()
bets["bankroll"] = 10000 + bets["cumulative_pnl"]

print(bets[["date", "market", "odds", "stake", "result", "payout", "bankroll"]].to_string(index=False))
print(f"\nTotal P&L: ${bets['payout'].sum():+.2f}")
print(f"Win rate: {(bets['result'] == 'W').mean():.1%}")
print(f"ROI: {bets['payout'].sum() / bets['stake'].sum():.1%}")

statsmodels — Regression and Time Series

statsmodels provides the statistical modeling primitives for regression-based prediction and line movement time series analysis.

pip install statsmodels

import statsmodels.api as sm
import pandas as pd
import numpy as np

# Logistic regression: predict NBA game outcomes from basic features
np.random.seed(42)
n = 200
data = pd.DataFrame({
    "home_net_rating": np.random.normal(2.5, 6, n),
    "away_net_rating": np.random.normal(0, 6, n),
    "home_rest_days": np.random.choice([1, 2, 3], n),
    "away_rest_days": np.random.choice([1, 2, 3], n),
})
# Simulated outcomes correlated with net rating differential
diff = data["home_net_rating"] - data["away_net_rating"] + 3  # home advantage
data["home_win"] = (diff + np.random.normal(0, 8, n) > 0).astype(int)

X = sm.add_constant(data[["home_net_rating", "away_net_rating", "home_rest_days", "away_rest_days"]])
model = sm.Logit(data["home_win"], X).fit(disp=0)
print(model.summary2().tables[1])

Machine Learning Libraries

scikit-learn — Classification, Regression, Validation

scikit-learn is the standard ML library for sports prediction models. Its key value for betting agents: TimeSeriesSplit for temporal cross-validation (never use random splits on time-ordered data) and CalibratedClassifierCV for ensuring model outputs are true probabilities, not just rankings. Calibration is critical — see the calibration guide.

pip install scikit-learn

from sklearn.ensemble import GradientBoostingClassifier
from sklearn.model_selection import TimeSeriesSplit
from sklearn.calibration import CalibratedClassifierCV
from sklearn.metrics import brier_score_loss
import numpy as np

# Temporal cross-validation for a sports prediction model
np.random.seed(42)
n_games = 1000
X = np.random.randn(n_games, 8)  # 8 features per game
y = (X[:, 0] + 0.5 * X[:, 1] + np.random.randn(n_games) * 0.8 > 0).astype(int)

tscv = TimeSeriesSplit(n_splits=5)
brier_scores = []

for train_idx, test_idx in tscv.split(X):
    X_train, X_test = X[train_idx], X[test_idx]
    y_train, y_test = y[train_idx], y[test_idx]

    base_model = GradientBoostingClassifier(
        n_estimators=100, max_depth=3, learning_rate=0.1
    )
    # Platt scaling for probability calibration
    model = CalibratedClassifierCV(base_model, cv=3, method="sigmoid")
    model.fit(X_train, y_train)

    probs = model.predict_proba(X_test)[:, 1]
    brier = brier_score_loss(y_test, probs)
    brier_scores.append(brier)

print(f"Mean Brier Score: {np.mean(brier_scores):.4f} (+/- {np.std(brier_scores):.4f})")
print(f"Perfect calibration = 0.0, coin flip = 0.25")

XGBoost and LightGBM — Gradient Boosting

Gradient-boosted trees are the dominant model class in sports prediction. XGBoost and LightGBM consistently win Kaggle competitions involving tabular sports data. They handle mixed feature types (continuous stats + categorical venue/team), capture nonlinear interactions, and resist overfitting with built-in regularization. See the feature engineering guide for what features to feed them.

pip install xgboost lightgbm

import xgboost as xgb
import numpy as np
from sklearn.model_selection import TimeSeriesSplit

# XGBoost for NFL game prediction
np.random.seed(42)
n = 800
X = np.random.randn(n, 12)  # offensive EPA, defensive EPA, turnover margin, etc.
y = (X[:, 0] * 1.5 + X[:, 2] * 0.8 - X[:, 5] + np.random.randn(n) > 0).astype(int)

train_size = int(0.8 * n)
X_train, X_test = X[:train_size], X[train_size:]
y_train, y_test = y[:train_size], y[train_size:]

dtrain = xgb.DMatrix(X_train, label=y_train)
dtest = xgb.DMatrix(X_test, label=y_test)

params = {
    "objective": "binary:logistic",
    "eval_metric": "logloss",
    "max_depth": 4,
    "learning_rate": 0.05,
    "subsample": 0.8,
    "colsample_bytree": 0.8,
    "reg_alpha": 0.1,   # L1 regularization
    "reg_lambda": 1.0,  # L2 regularization
}

model = xgb.train(
    params, dtrain, num_boost_round=200,
    evals=[(dtest, "test")], verbose_eval=50
)

probs = model.predict(dtest)
print(f"\nMean predicted probability: {probs.mean():.3f}")
print(f"Actual win rate: {y_test.mean():.3f}")

LightGBM is faster than XGBoost on large datasets (>100K rows) and uses histogram-based splitting:

import lightgbm as lgb
import numpy as np

np.random.seed(42)
n = 800
X = np.random.randn(n, 12)
y = (X[:, 0] * 1.5 + X[:, 2] * 0.8 - X[:, 5] + np.random.randn(n) > 0).astype(int)

train_size = int(0.8 * n)
train_data = lgb.Dataset(X[:train_size], label=y[:train_size])
test_data = lgb.Dataset(X[train_size:], label=y[train_size:], reference=train_data)

params = {
    "objective": "binary",
    "metric": "binary_logloss",
    "num_leaves": 31,
    "learning_rate": 0.05,
    "feature_fraction": 0.8,
    "bagging_fraction": 0.8,
    "bagging_freq": 5,
    "verbose": -1,
}

model = lgb.train(params, train_data, num_boost_round=200, valid_sets=[test_data])
probs = model.predict(X[train_size:])
print(f"Mean predicted prob: {probs.mean():.3f}, Actual: {y[train_size:].mean():.3f}")

Prediction Market and Odds APIs

py-clob-client — Polymarket CLOB Access

py-clob-client is the official Python SDK for Polymarket’s Central Limit Order Book on Polygon. It handles authentication, orderbook reads, order placement, and trade history. This is Layer 3 (Trading) infrastructure. See the Polymarket API guide and py-clob-client reference for full documentation.

pip install py-clob-client

from py_clob_client.client import ClobClient

# Read-only access — no API key needed
client = ClobClient(
    host="https://clob.polymarket.com",
    chain_id=137  # Polygon mainnet
)

# Fetch orderbook for a specific market
token_id = "71321045679252212594626385532706912750332728571942532289631379312455583992563"
book = client.get_order_book(token_id)

if book.bids and book.asks:
    best_bid = float(book.bids[0].price)
    best_ask = float(book.asks[0].price)
    midpoint = (best_bid + best_ask) / 2
    spread = best_ask - best_bid
    print(f"Best bid: ${best_bid:.4f}")
    print(f"Best ask: ${best_ask:.4f}")
    print(f"Midpoint (implied prob): {midpoint:.1%}")
    print(f"Spread: ${spread:.4f}")

kalshi_python_sync — Kalshi REST API

kalshi_python_sync wraps the Kalshi REST API with RSA-PSS authentication handling. Kalshi is the only CFTC-regulated prediction market in the US, making it the compliant choice for agents operating in regulated environments. The old kalshi-python package is deprecated.

pip install kalshi_python_sync

import requests

# Public market data — no auth required
KALSHI_API = "https://api.elections.kalshi.com/trade-api/v2"

resp = requests.get(f"{KALSHI_API}/markets", params={
    "limit": 5,
    "status": "open",
})

if resp.status_code == 200:
    for market in resp.json().get("markets", []):
        from decimal import Decimal
        yes_bid = market.get("yes_bid_dollars", "0")
        yes_ask = market.get("yes_ask_dollars", "0")
        if yes_bid and yes_ask:
            implied = (Decimal(yes_bid) + Decimal(yes_ask)) / 2
            print(f"{market['ticker']}: {implied:.1%} — {market['title'][:70]}")

For the full Kalshi API endpoint reference, see the Prediction Market API Reference and the Kalshi API tool page.

python-the-odds-api — Multi-Sportsbook Odds

The Odds API aggregates real-time odds from 40+ sportsbooks into a single REST endpoint. For agent developers, this is the fastest path to cross-book line comparison and arbitrage detection. It maps directly to the edge detection pipeline.

pip install python-the-odds-api

import requests

API_KEY = "YOUR_ODDS_API_KEY"  # free tier: 500 requests/month
BASE_URL = "https://api.the-odds-api.com/v4"

# Pull NBA moneylines from all available sportsbooks
resp = requests.get(f"{BASE_URL}/sports/basketball_nba/odds", params={
    "apiKey": API_KEY,
    "regions": "us,us2",
    "markets": "h2h",
    "oddsFormat": "american",
})

if resp.status_code == 200:
    for game in resp.json()[:3]:
        print(f"\n{game['away_team']} @ {game['home_team']}")
        for book in game.get("bookmakers", [])[:4]:
            outcomes = book["markets"][0]["outcomes"]
            home = next(o for o in outcomes if o["name"] == game["home_team"])
            away = next(o for o in outcomes if o["name"] == game["away_team"])
            print(f"  {book['title']:>12}: {away['name']} {away['price']:+d} / {home['name']} {home['price']:+d}")

Cross-book odds enable the arbitrage detection algorithms and feed the AgentBets Vig Index computations. See the arbitrage calculator for a live implementation.

Sports-Specific Data Libraries

These are Layer 1 (Data) libraries. Each provides sport-specific datasets that feed directly into the modeling layer.

nfl_data_py — NFL Play-by-Play and Stats

nfl_data_py provides access to NFL play-by-play data, seasonal stats, next-gen tracking metrics, and draft picks from nflfastR. This is the dataset behind the NFL mathematical modeling guide.

pip install nfl_data_py

import nfl_data_py as nfl
import pandas as pd

# Load 2025 season play-by-play data
pbp = nfl.import_pbp_data([2025])

# Calculate offensive EPA per play by team (key model feature)
team_epa = (
    pbp[pbp["play_type"].isin(["pass", "run"])]
    .groupby("posteam")["epa"]
    .agg(["mean", "count", "std"])
    .rename(columns={"mean": "epa_per_play", "count": "plays", "std": "epa_std"})
    .sort_values("epa_per_play", ascending=False)
)
print(team_epa.head(10).to_string())

nba_api — NBA Stats and Box Scores

nba_api wraps NBA.com’s stats endpoints. Use it for building the NBA win probability models.

pip install nba_api

from nba_api.stats.endpoints import leaguedashteamstats
import pandas as pd

# Pull 2025-26 team advanced stats
stats = leaguedashteamstats.LeagueDashTeamStats(
    season="2025-26",
    measure_type_detailed_defense="Advanced"
)
df = stats.get_data_frames()[0]
print(df[["TEAM_NAME", "OFF_RATING", "DEF_RATING", "NET_RATING", "PACE"]]
      .sort_values("NET_RATING", ascending=False)
      .head(10)
      .to_string(index=False))

pybaseball — MLB Statcast Data

pybaseball provides pitch-level Statcast data, Fangraphs leaderboards, and Baseball Reference stats. This powers the MLB run expectancy and Markov chain models.

pip install pybaseball

from pybaseball import statcast
import pandas as pd

# Pull one week of Statcast pitch data
data = statcast(start_dt="2025-06-01", end_dt="2025-06-07")

# Calculate average exit velocity and launch angle by batter
batted_balls = data[data["type"] == "X"]  # balls in play
batter_stats = (
    batted_balls.groupby("batter")
    .agg(
        avg_exit_velo=("launch_speed", "mean"),
        avg_launch_angle=("launch_angle", "mean"),
        barrels=("barrel", "sum"),
        batted_balls=("launch_speed", "count"),
    )
    .query("batted_balls >= 15")
    .sort_values("avg_exit_velo", ascending=False)
)
print(batter_stats.head(10).to_string())

soccerdata — European Football Match Data

soccerdata aggregates match results, xG data, and league tables from FBref, Club Elo, and other sources. Use it for the xG betting model and World Cup 2026 projections.

pip install soccerdata

import soccerdata as sd

# Pull Premier League 2025-26 match results from FBref
fbref = sd.FBref(leagues="ENG-Premier League", seasons="2025-2026")
schedule = fbref.read_schedule()
print(schedule[["date", "home_team", "away_team", "score"]].head(10).to_string(index=False))

Bayesian Inference Libraries

PyMC — Probabilistic Programming

PyMC is the go-to library for Bayesian models in sports betting. Hierarchical team strength models, Bayesian Elo, and posterior predictive checks all run through PyMC’s NUTS sampler. Polyseer uses Bayesian aggregation techniques built on this foundation.

pip install pymc arviz

import pymc as pm
import numpy as np
import arviz as az

# Bayesian Poisson model for soccer match goals
np.random.seed(42)
n_matches = 100
home_goals_obs = np.random.poisson(1.6, n_matches)
away_goals_obs = np.random.poisson(1.2, n_matches)

with pm.Model() as goal_model:
    # Priors
    home_rate = pm.Gamma("home_rate", alpha=2, beta=1.5)
    away_rate = pm.Gamma("away_rate", alpha=2, beta=1.5)

    # Likelihoods
    home_goals = pm.Poisson("home_goals", mu=home_rate, observed=home_goals_obs)
    away_goals = pm.Poisson("away_goals", mu=away_rate, observed=away_goals_obs)

    # Sample posterior
    trace = pm.sample(2000, tune=1000, random_seed=42, progressbar=False)

summary = az.summary(trace, var_names=["home_rate", "away_rate"])
print(summary[["mean", "sd", "hdi_3%", "hdi_97%"]])

ArviZ provides model comparison via WAIC and LOO-CV — essential for choosing between competing models. See the scoring rules guide for the mathematical foundations.

Optimization and Portfolio Libraries

cvxpy — Convex Optimization

cvxpy solves the portfolio optimization problem for agents that bet across multiple correlated markets simultaneously. It enforces position constraints that map to Layer 2 wallet limits.

pip install cvxpy

import cvxpy as cp
import numpy as np

# Portfolio optimization: maximize return subject to risk constraint
np.random.seed(42)
n_markets = 5
expected_returns = np.array([0.08, 0.05, 0.12, 0.03, 0.07])
cov_matrix = np.array([
    [0.04, 0.01, 0.02, 0.00, 0.01],
    [0.01, 0.03, 0.01, 0.00, 0.00],
    [0.02, 0.01, 0.06, 0.01, 0.02],
    [0.00, 0.00, 0.01, 0.02, 0.00],
    [0.01, 0.00, 0.02, 0.00, 0.03],
])

weights = cp.Variable(n_markets)
portfolio_return = expected_returns @ weights
portfolio_risk = cp.quad_form(weights, cov_matrix)

# Maximize return subject to: risk <= 5%, weights sum to 1, no shorts, max 40% per position
problem = cp.Problem(
    cp.Maximize(portfolio_return),
    [
        portfolio_risk <= 0.05,
        cp.sum(weights) == 1,
        weights >= 0,
        weights <= 0.40,
    ]
)
problem.solve()

print("Optimal allocation:")
labels = ["Market A", "Market B", "Market C", "Market D", "Market E"]
for label, w in zip(labels, weights.value):
    print(f"  {label}: {w:.1%}")
print(f"Expected return: {portfolio_return.value:.2%}")
print(f"Portfolio risk (variance): {portfolio_risk.value:.4f}")

Visualization Libraries

matplotlib and seaborn — Static Model Diagnostics

matplotlib is the base plotting library. seaborn adds statistical plot types useful for model evaluation: calibration curves, residual plots, and distribution comparisons.

pip install matplotlib seaborn

import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np

# Calibration plot: does the model's 60% prediction actually win 60% of the time?
np.random.seed(42)
n = 2000
predicted_probs = np.random.beta(2, 2, n)
actual_outcomes = np.random.binomial(1, predicted_probs * 0.95 + 0.025)  # slightly miscalibrated

n_bins = 10
bin_edges = np.linspace(0, 1, n_bins + 1)
bin_means, bin_actuals = [], []
for i in range(n_bins):
    mask = (predicted_probs >= bin_edges[i]) & (predicted_probs < bin_edges[i + 1])
    if mask.sum() > 0:
        bin_means.append(predicted_probs[mask].mean())
        bin_actuals.append(actual_outcomes[mask].mean())

fig, ax = plt.subplots(figsize=(6, 6))
ax.plot([0, 1], [0, 1], "k--", label="Perfect calibration")
ax.scatter(bin_means, bin_actuals, s=80, zorder=3, label="Model")
ax.set_xlabel("Predicted probability")
ax.set_ylabel("Observed frequency")
ax.set_title("Model Calibration Plot")
ax.legend()
plt.tight_layout()
plt.savefig("calibration_plot.png", dpi=150)
print("Saved calibration_plot.png")

plotly — Interactive Dashboards

plotly creates interactive HTML charts useful for monitoring agent performance in real time. Use it for bankroll tracking dashboards and live odds visualization.

pip install plotly

import plotly.graph_objects as go
import numpy as np

# Interactive bankroll trajectory plot
np.random.seed(42)
days = np.arange(365)
bankroll = 10000 + np.cumsum(np.random.normal(15, 80, 365))

fig = go.Figure()
fig.add_trace(go.Scatter(
    x=days, y=bankroll, mode="lines",
    name="Bankroll", line=dict(color="blue", width=2)
))
fig.add_hline(y=10000, line_dash="dash", line_color="gray", annotation_text="Starting bankroll")
fig.update_layout(
    title="Agent Bankroll Over 365 Days",
    xaxis_title="Day",
    yaxis_title="Bankroll ($)",
)
fig.write_html("bankroll_dashboard.html")
print("Saved interactive dashboard: bankroll_dashboard.html")

Implementation

Here is a complete library installer and verifier that an agent runs at initialization to confirm its Python environment is production-ready:

import subprocess
import importlib
import sys
from dataclasses import dataclass


@dataclass
class LibrarySpec:
    """Specification for a required library."""
    pip_name: str
    import_name: str
    min_version: str
    layer: str
    purpose: str


AGENT_TOOLKIT = [
    LibrarySpec("numpy", "numpy", "1.24.0", "All", "Vectorized numerical operations"),
    LibrarySpec("scipy", "scipy", "1.10.0", "L2/L4", "Distributions, optimization, stats"),
    LibrarySpec("pandas", "pandas", "2.0.0", "L1/L2", "Data manipulation, P&L tracking"),
    LibrarySpec("scikit-learn", "sklearn", "1.3.0", "L4", "ML models, cross-validation"),
    LibrarySpec("xgboost", "xgboost", "2.0.0", "L4", "Gradient-boosted prediction models"),
    LibrarySpec("lightgbm", "lightgbm", "4.0.0", "L4", "Fast gradient boosting"),
    LibrarySpec("statsmodels", "statsmodels", "0.14.0", "L4", "Regression, time series"),
    LibrarySpec("requests", "requests", "2.28.0", "L3", "HTTP client for APIs"),
    LibrarySpec("py-clob-client", "py_clob_client", "0.0.1", "L3", "Polymarket CLOB SDK"),
    LibrarySpec("pymc", "pymc", "5.0.0", "L4", "Bayesian probabilistic programming"),
    LibrarySpec("arviz", "arviz", "0.15.0", "L4", "Bayesian model diagnostics"),
    LibrarySpec("cvxpy", "cvxpy", "1.3.0", "L2/L4", "Convex portfolio optimization"),
    LibrarySpec("matplotlib", "matplotlib", "3.7.0", "All", "Static visualization"),
    LibrarySpec("plotly", "plotly", "5.15.0", "All", "Interactive visualization"),
]


def check_library(spec: LibrarySpec) -> dict:
    """Check if a library is installed and meets version requirements."""
    try:
        mod = importlib.import_module(spec.import_name)
        version = getattr(mod, "__version__", "unknown")
        return {
            "name": spec.pip_name,
            "status": "installed",
            "version": version,
            "layer": spec.layer,
            "purpose": spec.purpose,
        }
    except ImportError:
        return {
            "name": spec.pip_name,
            "status": "missing",
            "version": None,
            "layer": spec.layer,
            "purpose": spec.purpose,
        }


def verify_toolkit() -> None:
    """Verify all required libraries are installed."""
    print("Agent Betting Toolkit — Environment Check")
    print("=" * 65)

    installed, missing = [], []
    for spec in AGENT_TOOLKIT:
        result = check_library(spec)
        if result["status"] == "installed":
            installed.append(result)
            print(f"  [OK]  {result['name']:>20}  v{result['version']:<12}  {result['layer']}")
        else:
            missing.append(result)
            print(f"  [--]  {result['name']:>20}  {'NOT FOUND':<12}  {result['layer']}")

    print("=" * 65)
    print(f"Installed: {len(installed)}/{len(AGENT_TOOLKIT)}")

    if missing:
        names = " ".join(m["name"] for m in missing)
        print(f"\nInstall missing: pip install {names}")


if __name__ == "__main__":
    verify_toolkit()

Limitations and Edge Cases

Version conflicts. PyMC and TensorFlow can conflict on shared C-level dependencies (theano-pymc vs. tensorflow). Run Bayesian and deep learning workloads in separate virtual environments or use conda for dependency isolation.

API rate limits. The Odds API free tier allows 500 requests/month. nba_api throttles to ~1 request/second. py-clob-client has no published rate limit, but aggressive polling (>10 req/s) triggers temporary blocks. Build exponential backoff into every API client.

Data staleness. nfl_data_py and pybaseball pull from community-maintained data repositories. New season data sometimes lags by days or weeks at season start. During this window, an agent’s model retraining pipeline stalls unless it falls back to cached data.

Sports-specific libraries break. nba_api wraps NBA.com’s undocumented internal API. Endpoints change without notice. Pin your nba_api version and test weekly. pybaseball similarly depends on Baseball Savant’s scraping interface.

Memory constraints. Loading a full NFL play-by-play season (~50K rows, 400+ columns) into pandas uses ~500MB of RAM. Five seasons for model training: ~2.5GB. On memory-constrained agent instances, use polars (a Rust-based DataFrame library) for 3-5x lower memory usage, or load only the columns you need.

Calibration drift. scikit-learn’s CalibratedClassifierCV calibrates on the training set distribution. If the test distribution shifts (new season, rule changes, roster turnover), recalibrate monthly. The calibration guide covers detection and correction methods.

FAQ

What Python libraries do I need for a sports betting model?

The core stack is NumPy for vectorized math, pandas for data manipulation, scikit-learn or XGBoost for model training, and scipy.stats for probability distributions. For data, use nfl_data_py (NFL), nba_api (NBA), or pybaseball (MLB). For odds, use python-the-odds-api to pull lines from 40+ sportsbooks.

How do I connect to Polymarket with Python?

Use the py-clob-client library: pip install py-clob-client. It connects to Polymarket’s CLOB on Polygon, providing orderbook access, order placement, and trade history. Initialize with ClobClient(host='https://clob.polymarket.com', chain_id=137) for read-only access, or add your API key and private key for trading. See the py-clob-client reference for the complete method documentation.

What is the best machine learning algorithm for sports betting predictions?

Gradient-boosted trees (XGBoost, LightGBM) dominate sports prediction competitions and production betting models. They handle mixed feature types, capture nonlinear interactions, and resist overfitting with proper regularization. Use scikit-learn’s TimeSeriesSplit for validation — never random cross-validation on temporal data. The feature engineering guide covers what features to build.

How do I calculate Kelly Criterion bet sizing in Python?

For single bets, use the closed-form formula: f_star = (b * p - q) / b, where b = decimal_odds - 1, p = win probability, q = 1 - p. For simultaneous multi-outcome Kelly, use scipy.optimize.minimize with the negative log-growth objective and position bounds. See the Kelly Criterion guide for the full derivation, simulation results, and fractional Kelly variants.

What Python libraries support Bayesian sports modeling?

PyMC is the primary Bayesian modeling library — it supports hierarchical models with NUTS (No-U-Turn Sampler) for efficient posterior sampling. ArviZ handles posterior diagnostics, trace plots, and model comparison via WAIC and LOO-CV. These are essential for building Bayesian Elo systems and hierarchical team strength models. The Bayesian updating guide covers the mathematical foundations.

What’s Next

This page catalogs the tools. The next step is combining them into production pipelines.

End-to-end pipeline: The Odds API to Edge Detection Pipeline shows how to chain data ingestion, feature engineering, modeling, and execution into a single automated workflow using these libraries.
Math reference: The Probability Distributions Cheat Sheet maps every distribution to its scipy.stats implementation with betting-specific usage examples.
Complete glossary: The Betting Math Glossary defines every mathematical term used across this series.
Tool directory: Browse the full AgentBets tool directory for production-ready implementations built on these libraries.
Agent architecture: The Agent Betting Stack shows how all four layers connect in a production autonomous betting system.
Sharp betting strategy: Sharp Betting covers how professional bettors and agents use these tools to find and exploit market edges at offshore sportsbooks like BetOnline and Bookmaker.

Frequently Asked Questions

What Python libraries do I need for a sports betting model?

How do I connect to Polymarket with Python?

Use the py-clob-client library: pip install py-clob-client. It connects to Polymarket's CLOB on Polygon, providing orderbook access, order placement, and trade history. Initialize with ClobClient(host='https://clob.polymarket.com', chain_id=137) for read-only access, or add your API key and private key for trading.

What is the best machine learning algorithm for sports betting predictions?

Gradient-boosted trees (XGBoost, LightGBM) dominate sports prediction competitions and production betting models. They handle mixed feature types, capture nonlinear interactions, and resist overfitting with proper regularization. Use scikit-learn's TimeSeriesSplit for validation — never random cross-validation on temporal data.

How do I calculate Kelly Criterion bet sizing in Python?

For single bets, use the closed-form formula: f_star = (b * p - q) / b, where b = decimal_odds - 1, p = win probability, q = 1 - p. For simultaneous multi-outcome Kelly, use scipy.optimize.minimize with the negative log-growth objective. See the Kelly Criterion guide for the full derivation and implementation.

What Python libraries support Bayesian sports modeling?