Prediction market API integration is where most agent-based betting pipelines break down. After building a liquidity scoring system across Polymarket and Kalshi, we catalogued seven recurring mistakes that cost weeks of debugging time — from field name mismatches to silent memory kills on serverless platforms. This post covers each failure and the specific fix.
API Documentation Lies
The single most common source of bugs in multi-platform prediction market pipelines is field name mismatch. You code against documentation, deploy, and get zeroes everywhere with no error messages.
Polymarket’s Gamma API documents snake_case fields like condition_id and volume_num. The actual endpoint returns camelCase: conditionId, volumeNum, negRisk. Worse, the clobTokenIds field looks like an array but is actually a JSON-encoded string — a string containing a JSON array containing strings. Double-serialized. Your parser passes "[" as a token ID, every orderbook fetch 404s, and the logs show cryptic failures.
Kalshi has its own naming traps. Volume lives in volume_24h_fp (a string, not a number). The orderbook data sits under orderbook_fp.yes_dollars and orderbook_fp.no_dollars, not the orderbook.yes path you might expect. Prices arrive in integer cents, not dollar floats.
The fix is boring but essential: curl every endpoint you plan to use, inspect the raw JSON, and build a validation step that asserts non-zero data for at least one market before writing to storage. If you’re working with the Polymarket API or Kalshi API, validate field names against live responses on day one. Our prediction market API reference covers the canonical field mappings across both platforms.
The Hidden Junk Data Problem
Kalshi’s /markets endpoint returns over 10,000 open markets. Impressive until you realize the first 6,000+ are auto-generated parlay and multi-game markets (tickers starting with KXMVE) with zero volume, zero depth, and zero trading activity. The real prediction markets — politics, economics, weather — are buried pages deep.
The solution: use Kalshi’s /events endpoint with with_nested_markets=true. This returns actual prediction events with their markets nested inside. No parlays. Market count drops from 10,000+ to around 3,500, but every one is a real market somebody might trade.
This matters for any agent operating at the Trading layer of the Agent Betting Stack. If your market scanner doesn’t filter junk data, your opportunity set is noise. Market count is a vanity metric — tradable market count is the metric that matters.
The Bids-Only Orderbook
Kalshi’s orderbook doesn’t return asks. It returns YES bids and NO bids. To derive the YES ask price, you compute 100 - best_NO_bid. The NO bids array becomes the YES asks array with inverted prices.
This is mathematically correct but introduces a mirror operation that no other exchange API requires. If your liquidity scoring or spread calculations assume a standard bid/ask format, they’ll produce garbage for Kalshi without explicit normalization. Verify derived spreads against Kalshi’s inline yes_bid_dollars / yes_ask_dollars fields.
Memory Limits on Serverless Platforms
The “fetch everything, then process” pattern — clean and intuitive on a regular server — explodes on Cloudflare Workers. The 128MB memory ceiling means you cannot accumulate nested JSON responses from hundreds of markets before processing.
The fix requires iterative architecture from the start. Use flat endpoints instead of nested ones (Polymarket /markets instead of /events). Process orderbooks one at a time: fetch, score, discard. Never hold more than one orderbook in memory simultaneously. This drops peak memory from hundreds of MB to single digits.
If you’re building betting agents on serverless infrastructure, this constraint should inform your architecture before you write a line of scoring logic. The Agent Intelligence layer works best when the underlying data pipeline is designed for the runtime it actually deploys on.
Subrequest and Timeout Ceilings
Every fetch() call on a Cloudflare Worker counts against a 1,000-subrequest limit (paid plan). With two platforms, 400 orderbook fetches each, and pagination overhead, you’re at roughly 815 subrequests. That leaves headroom, but it means you cannot score every market — you must rank by volume and score the top N.
The timeout trap is subtler. Using ctx.waitUntil() for background processing on fetch handlers gives about 30 seconds. A pipeline fetching 400 orderbooks with rate-limiting delays blows past that silently — the task just gets cancelled with no error, no log, no indication it died.
The fix: use cron triggers (15-minute budget) for production runs. For manual testing, run inline synchronously. Design your pipeline around the subrequest budget the way you’d design around a memory budget — it’s the binding constraint.
Selection Bias in Health Metrics
This one isn’t a code bug — it’s a statistics bug. Score the top 100 markets by volume, compute health metrics from those scores, and you get “0% ghost markets, median score 78.” Looks great. But you cherry-picked the most active markets — of course they’re healthy.
The real question: what fraction of all listed markets are actually tradable? The answer changes your understanding of both platforms. Split into two phases: cheap discovery (paginate all markets, extract lightweight metadata) and expensive scoring (full orderbooks for top 400). Use discovery data for health dashboards and category heatmaps. Use scored data for spread and depth analysis.
This principle applies to any agent building a market scanning skill or position-sizing model. Your denominators need the full universe, not a pre-filtered subset. If your Kelly Criterion sizing assumes every market in your feed is liquid, you’re sizing positions against phantom liquidity.
Signal Cadence vs. Signal Half-Life
Not every derived signal works at every refresh rate. At 8-hour cron snapshots:
- Volume/depth ratios (smart money proxy): catch sustained sweeps but miss intraday events that replenish within hours. Fine for editorial use — “is this price reliable to cite?” — but not for real-time alerts.
- Cross-platform price gaps (arbitrage proxy): real arb opportunities close in minutes. At 8-hour cadence, what you detect is persistent platform disagreement — useful for analysis but not actionable for trading.
- Platform health trends: move over weeks. Daily is more than sufficient.
Match your infrastructure to your signal’s half-life. Don’t build a cron job for something that needs websockets. Don’t build websockets for something that changes weekly. This is the core tension in the Intelligence layer — the right answer depends on what you’re optimizing for.
The Liquidity Scoring Formula
For reference, the composite score that survived all the mistakes above:
Four components, each 0-25 points. Spread: bid-ask tightness relative to tick size, with price-zone adjustment for extreme prices. Depth: resting size within 10% of midpoint, using the weaker side as bottleneck, log-scaled. Volume: 24-hour trading volume, log-scaled. Balance: symmetry of bid vs. ask depth as a min/max ratio.
Total is clamped to 0-100 and mapped to tiers: Excellent (80+), Good (60-79), Fair (40-59), Thin (20-39), Illiquid (0-19). All price math uses integer cents to avoid floating-point drift. The Vig Index applies a similar philosophy to sportsbook efficiency — measure the thing that matters, not the thing that’s easy to measure.
What the Data Reveals
After fixing all seven mistakes, the cross-platform picture is stark. Polymarket’s top 400 markets include 352 scoring 60+ (liquid). Category breakdown: roughly half sports, 43% politics, the rest crypto and miscellaneous. Kalshi’s top 400 include zero markets scoring 60+ and only 45 scoring 20+ (tradable). The median scored market on Kalshi sits at 11.
This isn’t a judgment about platform quality — Kalshi operates in a regulated environment with different market structure constraints. But if you’re building an agent that treats prediction market prices as trading signals, the liquidity behind those prices determines whether the signal is real or a stale quote on a ghost market.
Building Forward
Every mistake on this list cost at least a day of debugging. Most produced silent failures — correct-looking outputs that were quietly wrong. The common thread: validate against reality, not documentation. Design for constraints, not ideals. And measure the full universe before drawing conclusions from a subset.
If you’re building prediction market infrastructure for autonomous agents, the Agent Betting Stack provides the architectural framework, and the MCP server gives agents direct access to market data without building custom integrations. Start with the constraints — memory, subrequests, signal cadence — and let those shape the architecture.
