Agent Intelligence Guide: LLM Analysis for Prediction Markets
The intelligence layer is the brain of your prediction market agent. Layers 1 through 3 give your agent an identity, a wallet, and the ability to execute trades. Layer 4 decides what to trade and when.
This is the hardest layer to build and the one with the most variation. There is no single correct architecture — the best agents combine multiple signal sources, calibrate confidence carefully, and adapt their strategy to market conditions. This guide walks through every component you need, with working code for each.
Prerequisites
Before building your intelligence layer, you should have:
- A working Layer 3 setup (either Polymarket or Kalshi)
- Python 3.10+ with
pipavailable - An API key for at least one LLM provider (Anthropic recommended)
- Familiarity with prediction market mechanics (see the glossary)
Install the core dependencies used throughout this guide:
pip install anthropic httpx pydantic numpy
The Core Decision Loop
Every prediction market agent, regardless of strategy, follows the same loop:
┌──────────────────────────────────────────────────┐
│ DECISION LOOP │
│ │
│ ┌─────────┐ ┌─────────┐ ┌──────────┐ │
│ │ OBSERVE │───▶│ ANALYZE │───▶│ DECIDE │ │
│ │ │ │ │ │ │ │
│ │ Market │ │ LLM + │ │ Edge + │ │
│ │ data, │ │ signals │ │ sizing │ │
│ │ news, │ │ + Bayes │ │ logic │ │
│ │ social │ │ │ │ │ │
│ └─────────┘ └─────────┘ └────┬─────┘ │
│ │ │
│ ┌─────────┐ │ │
│ │ EXECUTE │◀───────────────────────┘ │
│ │ │ │
│ │ Layer 3 │ │
│ │ trade │ │
│ └─────────┘ │
└──────────────────────────────────────────────────┘
Observe: Pull market data from Polymarket or Kalshi APIs, fetch news and social signals, check your current positions.
Analyze: Feed observations into your intelligence pipeline — LLM evaluation, sentiment scoring, Bayesian updates, signal aggregation.
Decide: Compare your estimated probability to the market price. If the edge exceeds your threshold, calculate position size.
Execute: Send orders through Layer 3. Log the decision and result for future backtesting.
The rest of this guide breaks down the Analyze and Decide phases in detail.
LLM Prompt Patterns for Market Evaluation
LLMs are the fastest way to get a working intelligence layer. A well-crafted prompt can evaluate a prediction market with surprising accuracy — especially for markets that depend on reasoning about public information rather than private data.
Basic Market Analysis Prompt
The simplest useful pattern sends market metadata to an LLM and asks for a probability estimate:
import anthropic
import json
client = anthropic.Anthropic() # uses ANTHROPIC_API_KEY env var
def evaluate_market(question: str, yes_price: float, volume_24h: float,
end_date: str, description: str) -> dict:
"""Ask an LLM to evaluate a prediction market and return structured analysis."""
prompt = f"""You are an expert prediction market analyst. Evaluate this market
and provide your independent probability estimate.
Market question: {question}
Current Yes price: ${yes_price:.2f} (implies {yes_price * 100:.1f}% probability)
24h volume: ${volume_24h:,.0f}
Resolution date: {end_date}
Description: {description}
Instructions:
1. Reason step-by-step about the likely outcome
2. Consider base rates, recent developments, and known factors
3. Assign your probability estimate for Yes (0.0 to 1.0)
4. Rate your confidence in this estimate (1-10)
5. If your estimate differs from the market by more than 5 percentage points,
explain why you think the market is wrong
Respond with valid JSON only:
{{
"reasoning": "your step-by-step analysis",
"probability": 0.XX,
"confidence": N,
"edge_direction": "yes" | "no" | "none",
"edge_explanation": "why you disagree with the market, or null"
}}"""
response = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=1024,
messages=[{"role": "user", "content": prompt}]
)
return json.loads(response.content[0].text)
This works, but it has limitations. The LLM’s knowledge has a cutoff date, it can hallucinate confidence, and it has no access to real-time information. The sections below address each of these.
Structured Output with Pydantic
For production agents, enforce output structure with Pydantic so your downstream code never breaks on malformed LLM responses:
from pydantic import BaseModel, Field
class MarketAnalysis(BaseModel):
reasoning: str = Field(description="Step-by-step analysis")
probability: float = Field(ge=0.0, le=1.0, description="Estimated probability of Yes")
confidence: int = Field(ge=1, le=10, description="Confidence in estimate")
edge_direction: str = Field(description="yes, no, or none")
edge_explanation: str | None = Field(default=None)
def evaluate_market_structured(question: str, yes_price: float,
context: str) -> MarketAnalysis:
"""Evaluate a market with guaranteed structured output."""
response = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=1024,
messages=[{
"role": "user",
"content": f"""Analyze this prediction market and respond with JSON matching
this schema: {MarketAnalysis.model_json_schema()}
Question: {question}
Current Yes price: {yes_price}
Additional context: {context}"""
}]
)
return MarketAnalysis.model_validate_json(response.content[0].text)
Prompt Chaining: Research → Analyze → Decide
A single prompt can only use the LLM’s training data. For better results, chain multiple calls where earlier calls gather information that later calls analyze:
async def research_and_analyze(question: str, yes_price: float) -> MarketAnalysis:
"""Three-stage prompt chain: research, analyze, decide."""
# Stage 1: Research — identify what information matters
research_prompt = f"""For this prediction market question, list the 5 most important
factors that would determine the outcome. For each factor, describe what data
source could verify it (news API, social media, government data, etc.).
Question: {question}
Respond with JSON: [{{"factor": "...", "data_source": "...", "importance": 1-5}}]"""
research = client.messages.create(
model="claude-haiku-4-5-20251001", # fast + cheap for research
max_tokens=512,
messages=[{"role": "user", "content": research_prompt}]
)
factors = json.loads(research.content[0].text)
# Stage 2: Gather data for each factor (see Sentiment section below)
context = await gather_signals(factors)
# Stage 3: Final analysis with full context
analysis_prompt = f"""You are analyzing a prediction market with real-time data.
Question: {question}
Current Yes price: {yes_price}
Research factors and findings:
{json.dumps(context, indent=2)}
Based on this evidence, provide your probability estimate.
Respond with JSON: {MarketAnalysis.model_json_schema()}"""
response = client.messages.create(
model="claude-sonnet-4-6", # best model for final decision
max_tokens=1024,
messages=[{"role": "user", "content": analysis_prompt}]
)
return MarketAnalysis.model_validate_json(response.content[0].text)
Using claude-haiku-4-5-20251001 for research keeps costs low. Reserve claude-sonnet-4-6 or claude-opus-4-6 for the final analysis where accuracy matters most.
Model Selection
| Model | Best For | Cost | Latency |
|---|---|---|---|
claude-opus-4-6 | Complex reasoning, multi-factor analysis | Highest | Slowest |
claude-sonnet-4-6 | Good balance of accuracy and speed | Medium | Medium |
claude-haiku-4-5-20251001 | Research, summarization, data extraction | Lowest | Fastest |
| GPT-4o | Alternative to Sonnet, comparable accuracy | Medium | Medium |
| Open-source (Llama, Mistral) | Self-hosted, no API costs, full control | Compute only | Variable |
For most agents, the sweet spot is Haiku for research stages and Sonnet for final analysis. Use Opus only for high-stakes markets where the edge needs to be precise.
Sentiment Analysis Pipelines
LLMs reason well from their training data, but prediction markets move on new information. Sentiment analysis gives your agent a real-time view of public opinion.
Architecture
┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐
│ X/Twitter│ │ Reddit │ │ News │ │ Moltbook │
│ API │ │ API │ │ APIs │ │ Feed │
└────┬─────┘ └────┬─────┘ └────┬─────┘ └────┬─────┘
│ │ │ │
▼ ▼ ▼ ▼
┌──────────────────────────────────────────────────┐
│ Sentiment Scorer │
│ (LLM classifies each item as bullish/bearish) │
└──────────────────────┬───────────────────────────┘
│
▼
┌─────────────────┐
│ Weighted Score │
│ (-1.0 to +1.0) │
└─────────────────┘
Fetching Signals
import httpx
from dataclasses import dataclass
from datetime import datetime
@dataclass
class Signal:
source: str # "twitter", "reddit", "news", "moltbook"
text: str # the content
timestamp: datetime
metadata: dict # followers, upvotes, source credibility, etc.
async def fetch_twitter_signals(query: str, count: int = 20) -> list[Signal]:
"""Fetch recent tweets about a topic. Requires X API Bearer Token."""
async with httpx.AsyncClient() as client:
resp = await client.get(
"https://api.x.com/2/tweets/search/recent",
headers={"Authorization": f"Bearer {TWITTER_BEARER_TOKEN}"},
params={
"query": query,
"max_results": count,
"tweet.fields": "created_at,public_metrics,author_id"
}
)
data = resp.json()
return [
Signal(
source="twitter",
text=tweet["text"],
timestamp=datetime.fromisoformat(tweet["created_at"].rstrip("Z")),
metadata=tweet.get("public_metrics", {})
)
for tweet in data.get("data", [])
]
async def fetch_reddit_signals(subreddit: str, query: str,
count: int = 20) -> list[Signal]:
"""Fetch recent Reddit posts. Uses public JSON endpoint."""
async with httpx.AsyncClient() as client:
resp = await client.get(
f"https://www.reddit.com/r/{subreddit}/search.json",
params={"q": query, "sort": "new", "limit": count, "t": "week"},
headers={"User-Agent": "AgentBets/1.0"}
)
data = resp.json()
return [
Signal(
source="reddit",
text=f"{post['data']['title']} {post['data'].get('selftext', '')}",
timestamp=datetime.fromtimestamp(post["data"]["created_utc"]),
metadata={"score": post["data"]["score"],
"num_comments": post["data"]["num_comments"]}
)
for post in data["data"]["children"]
]
async def fetch_news_signals(query: str, count: int = 10) -> list[Signal]:
"""Fetch recent news articles via NewsAPI."""
async with httpx.AsyncClient() as client:
resp = await client.get(
"https://newsapi.org/v2/everything",
params={
"q": query, "sortBy": "publishedAt",
"pageSize": count, "apiKey": NEWS_API_KEY
}
)
articles = resp.json().get("articles", [])
return [
Signal(
source="news",
text=f"{a['title']}. {a.get('description', '')}",
timestamp=datetime.fromisoformat(a["publishedAt"].rstrip("Z")),
metadata={"source_name": a["source"]["name"]}
)
for a in articles
]
Scoring Sentiment with an LLM
Rather than using traditional NLP sentiment libraries (which struggle with prediction market context), use a cheap LLM call to classify each signal:
async def score_signals(signals: list[Signal], market_question: str) -> float:
"""Score a batch of signals from -1.0 (bearish) to +1.0 (bullish)."""
if not signals:
return 0.0
signal_text = "\n".join(
f"[{s.source}] {s.text[:200]}" for s in signals[:30] # cap context size
)
response = client.messages.create(
model="claude-haiku-4-5-20251001",
max_tokens=256,
messages=[{
"role": "user",
"content": f"""Rate the overall sentiment of these signals regarding
this prediction market question:
Question: {market_question}
Signals:
{signal_text}
Score from -1.0 (strongly suggests No) to +1.0 (strongly suggests Yes).
Consider volume, recency, and source credibility.
Respond with JSON only: {{"score": X.XX, "reasoning": "brief explanation"}}"""
}]
)
result = json.loads(response.content[0].text)
return float(result["score"])
Signal Aggregation and Confidence Scoring
Individual signals are noisy. An agent that acts on a single tweet or one LLM analysis will lose money. Signal aggregation combines multiple sources into a single confidence-weighted estimate.
The Signal Aggregator
from dataclasses import dataclass, field
@dataclass
class WeightedSignal:
name: str
value: float # probability estimate (0.0 to 1.0)
confidence: float # how much to trust this signal (0.0 to 1.0)
weight: float # base weight for this signal type
class SignalAggregator:
"""Combine multiple probability signals into a single estimate."""
def __init__(self):
self.signals: list[WeightedSignal] = []
def add(self, name: str, value: float, confidence: float, weight: float = 1.0):
self.signals.append(WeightedSignal(name, value, confidence, weight))
def aggregate(self) -> dict:
"""Weighted average where each signal's influence is weight * confidence."""
if not self.signals:
return {"probability": 0.5, "confidence": 0.0, "n_signals": 0}
total_weight = 0.0
weighted_sum = 0.0
for s in self.signals:
effective_weight = s.weight * s.confidence
weighted_sum += s.value * effective_weight
total_weight += effective_weight
if total_weight == 0:
return {"probability": 0.5, "confidence": 0.0,
"n_signals": len(self.signals)}
probability = weighted_sum / total_weight
# Confidence increases with more agreeing signals
agreement = 1.0 - self._signal_variance()
avg_confidence = sum(s.confidence for s in self.signals) / len(self.signals)
overall_confidence = min(agreement * avg_confidence, 1.0)
return {
"probability": round(probability, 4),
"confidence": round(overall_confidence, 4),
"n_signals": len(self.signals),
"signals": [
{"name": s.name, "value": s.value, "confidence": s.confidence}
for s in self.signals
]
}
def _signal_variance(self) -> float:
values = [s.value for s in self.signals]
mean = sum(values) / len(values)
return sum((v - mean) ** 2 for v in values) / len(values)
Usage Example
aggregator = SignalAggregator()
# LLM analysis (high weight — this is your primary signal)
llm_result = evaluate_market_structured(question, yes_price, context)
aggregator.add("llm_analysis", llm_result.probability,
llm_result.confidence / 10, weight=3.0)
# Sentiment from social signals
sentiment = await score_signals(twitter_signals + reddit_signals, question)
sentiment_as_prob = (sentiment + 1.0) / 2.0 # convert -1..1 to 0..1
aggregator.add("social_sentiment", sentiment_as_prob, 0.5, weight=1.0)
# News sentiment
news_sentiment = await score_signals(news_signals, question)
news_as_prob = (news_sentiment + 1.0) / 2.0
aggregator.add("news_sentiment", news_as_prob, 0.6, weight=1.5)
# Polyseer analysis (if available)
polyseer_result = await get_polyseer_analysis(market_id)
if polyseer_result:
aggregator.add("polyseer", polyseer_result["probability"],
polyseer_result["confidence"], weight=2.0)
result = aggregator.aggregate()
# {"probability": 0.62, "confidence": 0.71, "n_signals": 4, "signals": [...]}
Bayesian Probability Estimation
The market price is information. It represents the collective wisdom of all participants. Your agent should not ignore it — it should update from it. Bayesian estimation lets you start with the market’s probability (prior) and adjust based on your own evidence (likelihood).
How Bayesian Updating Works
- Prior: Start with the market price as your initial probability estimate
- Likelihood: For each piece of new evidence, estimate how likely that evidence would be if the outcome were Yes vs. No
- Posterior: Apply Bayes’ theorem to get an updated probability
def bayesian_update(prior: float, evidence: list[tuple[float, float]]) -> float:
"""Update a probability estimate with new evidence.
Args:
prior: Starting probability (e.g., market price)
evidence: List of (likelihood_if_yes, likelihood_if_no) tuples.
Each tuple represents one piece of evidence.
Returns:
Updated (posterior) probability.
"""
p_yes = prior
p_no = 1.0 - prior
for likelihood_yes, likelihood_no in evidence:
# Bayes' theorem: P(Yes|E) = P(E|Yes) * P(Yes) / P(E)
p_evidence = likelihood_yes * p_yes + likelihood_no * p_no
if p_evidence == 0:
continue
p_yes = (likelihood_yes * p_yes) / p_evidence
p_no = 1.0 - p_yes
return p_yes
Practical Example
Suppose a market asks “Will Company X announce layoffs this quarter?” and the current price is $0.30 (30% implied probability).
market_price = 0.30 # prior
evidence = [
# Bloomberg reports hiring freeze → more likely if layoffs coming
(0.8, 0.3), # P(hiring freeze | layoffs) = 0.8, P(hiring freeze | no layoffs) = 0.3
# CEO tweets "excited about growth" → less likely if layoffs coming
(0.2, 0.7), # P(growth tweet | layoffs) = 0.2, P(growth tweet | no layoffs) = 0.7
# Glassdoor reviews mention "restructuring" → more likely if layoffs
(0.7, 0.2), # P(restructuring mentions | layoffs) = 0.7, P(no restructuring | no) = 0.2
]
posterior = bayesian_update(market_price, evidence)
# posterior ≈ 0.56 — your estimate is now 56%, market says 30%
# This is a potential 26-point edge on Yes
Estimating Likelihoods with an LLM
You can use an LLM to estimate the likelihood ratios for each piece of evidence:
def estimate_likelihood(evidence_text: str, market_question: str) -> tuple[float, float]:
"""Ask an LLM to estimate P(evidence | Yes) and P(evidence | No)."""
response = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=256,
messages=[{
"role": "user",
"content": f"""Given this prediction market question and a piece of evidence,
estimate two probabilities:
Question: {market_question}
Evidence: {evidence_text}
1. P(evidence | Yes): How likely would this evidence be if the answer is Yes?
2. P(evidence | No): How likely would this evidence be if the answer is No?
Both should be between 0.01 and 0.99.
Respond with JSON: {{"p_evidence_given_yes": X.XX, "p_evidence_given_no": X.XX}}"""
}]
)
result = json.loads(response.content[0].text)
return (result["p_evidence_given_yes"], result["p_evidence_given_no"])
Edge Detection and When to Bet
Your agent should only bet when it has a meaningful edge — when its probability estimate differs enough from the market price to overcome fees and uncertainty.
Defining Edge
def calculate_edge(agent_probability: float, market_price: float,
confidence: float, platform_fee: float = 0.02) -> dict:
"""Calculate the edge and whether it's worth betting.
Args:
agent_probability: Your estimated probability of Yes
market_price: Current Yes price on the platform
confidence: Your confidence in the estimate (0-1)
platform_fee: Platform fee as a fraction (Polymarket ≈ 2%, Kalshi ≈ 7%)
Returns:
Dict with edge calculations and recommendation.
"""
raw_edge_yes = agent_probability - market_price
raw_edge_no = (1 - agent_probability) - (1 - market_price)
# Confidence-weighted edge
weighted_edge_yes = raw_edge_yes * confidence
weighted_edge_no = raw_edge_no * confidence
# Account for fees
net_edge_yes = weighted_edge_yes - platform_fee
net_edge_no = weighted_edge_no - platform_fee
# Determine best direction
if net_edge_yes > net_edge_no and net_edge_yes > 0:
direction = "yes"
net_edge = net_edge_yes
elif net_edge_no > 0:
direction = "no"
net_edge = net_edge_no
else:
direction = "none"
net_edge = 0.0
return {
"direction": direction,
"raw_edge": round(max(raw_edge_yes, raw_edge_no), 4),
"confidence_weighted_edge": round(max(weighted_edge_yes, weighted_edge_no), 4),
"net_edge_after_fees": round(net_edge, 4),
"should_bet": net_edge > 0.03, # minimum 3% net edge threshold
"agent_probability": agent_probability,
"market_price": market_price,
}
Edge Thresholds
Not every positive edge is worth trading. Set minimum thresholds based on your confidence level:
| Confidence | Minimum Net Edge | Reasoning |
|---|---|---|
| 8-10 | 3% | High confidence — small edge is acceptable |
| 5-7 | 7% | Moderate confidence — need larger buffer |
| 1-4 | 12%+ | Low confidence — only bet on large mispricings |
Kelly Criterion Preview
Once you’ve found an edge, how much should you bet? The Kelly criterion gives the mathematically optimal fraction of your bankroll:
def kelly_fraction(probability: float, market_price: float) -> float:
"""Calculate optimal bet size as a fraction of bankroll.
This is full Kelly — most practitioners use half-Kelly or quarter-Kelly
to reduce variance.
"""
if probability <= market_price:
return 0.0 # no edge
# Odds offered by the market
odds = (1 - market_price) / market_price
# Kelly formula: f = (p * (odds + 1) - 1) / odds
f = (probability * (odds + 1) - 1) / odds
return max(0.0, min(f, 1.0)) # clamp to [0, 1]
# Example: you estimate 65% probability, market is at 50%
fraction = kelly_fraction(0.65, 0.50)
# fraction ≈ 0.30 — Kelly says bet 30% of bankroll
# Half-Kelly (safer): 0.15
# Quarter-Kelly (conservative): 0.075
Full Kelly is aggressive. Most successful traders use half-Kelly (multiply by 0.5) or quarter-Kelly (multiply by 0.25) to reduce the risk of large drawdowns. For a comprehensive treatment of position sizing and bankroll management, see the Risk Management Guide (coming soon).
Strategy Types
Different market conditions call for different strategies. A well-designed agent can switch between strategies or run multiple strategies in parallel.
Momentum Strategy
Bet in the direction of recent price movement. If a market has been trending toward Yes, momentum says it will continue.
def momentum_signal(price_history: list[float], lookback: int = 10) -> dict:
"""Generate a momentum signal from recent price history.
Args:
price_history: List of recent Yes prices (oldest first)
lookback: Number of periods to analyze
Returns:
Signal with direction and strength.
"""
if len(price_history) < lookback:
return {"direction": "none", "strength": 0.0}
recent = price_history[-lookback:]
price_change = recent[-1] - recent[0]
avg_price = sum(recent) / len(recent)
# Normalize strength to [-1, 1]
strength = max(-1.0, min(1.0, price_change / max(avg_price, 0.01)))
if strength > 0.05:
direction = "yes"
elif strength < -0.05:
direction = "no"
else:
direction = "none"
return {"direction": direction, "strength": abs(strength)}
Best for: Markets with clear information cascades — elections after major poll releases, crypto markets after regulatory news.
Contrarian Strategy
Bet against the crowd when your signals disagree with the market direction. This works when markets overshoot on emotion.
def contrarian_signal(market_price: float, sentiment_score: float,
llm_probability: float, threshold: float = 0.15) -> dict:
"""Contrarian: bet against the crowd when fundamentals disagree.
The logic: if sentiment is extremely bullish but the LLM analysis
says the probability should be lower, the market may be overpriced.
"""
sentiment_implied = (sentiment_score + 1.0) / 2.0 # convert to probability
# Crowd-fundamental divergence
crowd_optimism = sentiment_implied - llm_probability
if crowd_optimism > threshold:
# Crowd too bullish → bet No
return {"direction": "no", "strength": crowd_optimism}
elif crowd_optimism < -threshold:
# Crowd too bearish → bet Yes
return {"direction": "yes", "strength": abs(crowd_optimism)}
else:
return {"direction": "none", "strength": 0.0}
Best for: Markets driven by social media hype — celebrity predictions, meme-coin adjacent markets.
Event-Driven Strategy
React to specific news events that should move market prices. The agent monitors news feeds and trades when a catalyst appears.
Try the starter bot: The Kalshi News Bot implements this pattern with Claude analysis and one-click Railway deploy.
async def event_driven_scan(markets: list[dict],
news_signals: list[Signal]) -> list[dict]:
"""Scan for markets affected by recent news events."""
opportunities = []
for market in markets:
# Check if any recent news is relevant to this market
relevant_news = [
s for s in news_signals
if any(keyword in s.text.lower()
for keyword in market.get("keywords", []))
]
if not relevant_news:
continue
# Use LLM to assess impact
impact = client.messages.create(
model="claude-haiku-4-5-20251001",
max_tokens=256,
messages=[{
"role": "user",
"content": f"""How does this news affect this prediction market?
Market: {market['question']}
Current price: {market['yes_price']}
News:
{chr(10).join(s.text[:150] for s in relevant_news[:5])}
Respond with JSON:
{{"impact": "bullish"|"bearish"|"neutral", "magnitude": 0.0-1.0,
"new_probability": 0.XX, "reasoning": "brief"}}"""
}]
)
result = json.loads(impact.content[0].text)
if result["magnitude"] > 0.3: # significant impact
opportunities.append({
"market": market,
"news": relevant_news,
"impact": result
})
return opportunities
Best for: Markets tied to scheduled events — earnings calls, elections, policy decisions, court rulings.
Arbitrage Strategy (Preview)
Arbitrage exploits pricing differences for the same outcome across platforms. For example, if Polymarket prices “Yes” at $0.60 and Kalshi prices the same outcome at $0.55, you can buy Yes on Kalshi and No on Polymarket for a risk-free profit.
def find_arbitrage(polymarket_yes: float, kalshi_yes: float,
poly_fee: float = 0.02, kalshi_fee: float = 0.07) -> dict:
"""Check for cross-platform arbitrage opportunity."""
# Cost to buy Yes on cheaper platform + No on expensive platform
# should be less than $1.00 for arbitrage to exist
# Strategy 1: Buy Yes on Kalshi, No on Polymarket
cost_1 = kalshi_yes + (1 - polymarket_yes) + poly_fee + kalshi_fee
profit_1 = 1.0 - cost_1
# Strategy 2: Buy Yes on Polymarket, No on Kalshi
cost_2 = polymarket_yes + (1 - kalshi_yes) + poly_fee + kalshi_fee
profit_2 = 1.0 - cost_2
best = max(profit_1, profit_2)
return {
"has_opportunity": best > 0,
"best_profit": round(best, 4),
"strategy": "kalshi_yes_poly_no" if profit_1 > profit_2
else "poly_yes_kalshi_no",
}
Cross-platform arbitrage is the most complex strategy to implement correctly because of settlement timing, fee structures, and resolution differences between platforms. For the complete treatment, see the Cross-Platform Arbitrage Guide (coming soon).
Tool Integration Patterns
Three tools in the ecosystem are purpose-built for the intelligence layer. Each serves a different role.
OpenClaw: Intelligence as Skills
OpenClaw is an open-source agent framework with a skill system. You can wrap your intelligence logic as OpenClaw skills, making them composable and reusable.
# Example: Register a market analysis skill with OpenClaw
# This is a simplified pattern — see OpenClaw docs for full skill API
class MarketAnalysisSkill:
"""OpenClaw skill that evaluates a prediction market."""
name = "market_analysis"
description = "Evaluate a prediction market using LLM analysis and sentiment"
async def execute(self, market_id: str, platform: str = "polymarket") -> dict:
# Fetch market data via Layer 3
market = await fetch_market(market_id, platform)
# Run intelligence pipeline
llm_result = evaluate_market_structured(
market["question"], market["yes_price"], market["description"]
)
signals = await fetch_twitter_signals(market["question"])
sentiment = await score_signals(signals, market["question"])
# Aggregate
aggregator = SignalAggregator()
aggregator.add("llm", llm_result.probability,
llm_result.confidence / 10, weight=3.0)
aggregator.add("sentiment", (sentiment + 1) / 2, 0.5, weight=1.0)
result = aggregator.aggregate()
# Edge detection
edge = calculate_edge(result["probability"], market["yes_price"],
result["confidence"])
return {"analysis": result, "edge": edge, "market": market}
OpenClaw’s memory system also lets your agent learn from past trades, storing which markets and strategies performed well.
Polyseer: Multi-Agent Bayesian Analysis
Polyseer provides a ready-made intelligence pipeline with multi-agent architecture and Bayesian probability aggregation. Instead of building everything from scratch, you can use Polyseer as your primary analysis engine:
import httpx
async def get_polyseer_analysis(market_id: str) -> dict | None:
"""Fetch Polyseer's multi-agent analysis for a market."""
async with httpx.AsyncClient() as client:
resp = await client.get(
f"https://api.polyseer.com/v1/analysis/{market_id}",
headers={"Authorization": f"Bearer {POLYSEER_API_KEY}"}
)
if resp.status_code != 200:
return None
data = resp.json()
return {
"probability": data["aggregated_probability"],
"confidence": data["confidence_score"],
"agent_analyses": data["individual_agents"],
"methodology": data["methodology"]
}
Polyseer is particularly useful as one signal in your aggregator — it provides an independent Bayesian estimate that you can combine with your own LLM analysis and sentiment signals.
Predly: Mispricing Detection
Predly specializes in detecting mispricings. Rather than building your own edge detection from scratch, you can use Predly’s signals as a filter — only analyze markets where Predly has flagged a potential mispricing:
async def get_predly_alerts(min_confidence: float = 0.7) -> list[dict]:
"""Fetch current mispricing alerts from Predly."""
async with httpx.AsyncClient() as client:
resp = await client.get(
"https://api.predly.ai/v1/alerts",
headers={"Authorization": f"Bearer {PREDLY_API_KEY}"},
params={"min_confidence": min_confidence}
)
return resp.json().get("alerts", [])
async def predly_filtered_pipeline():
"""Only run full analysis on markets Predly flags."""
alerts = await get_predly_alerts(min_confidence=0.75)
for alert in alerts:
# Predly found a mispricing — run full analysis to confirm
analysis = await research_and_analyze(
alert["market_question"], alert["current_price"]
)
edge = calculate_edge(
analysis.probability, alert["current_price"],
analysis.confidence / 10
)
if edge["should_bet"]:
print(f"Confirmed: {alert['market_question']}")
print(f" Predly says: {alert['predicted_direction']}")
print(f" Our analysis: {edge['direction']} "
f"with {edge['net_edge_after_fees']:.1%} edge")
This two-stage approach (Predly scans → your agent confirms) reduces LLM API costs by only running expensive analysis on promising opportunities.
Backtesting Basics
Never deploy real money on an untested strategy. Backtesting replays historical market data through your strategy logic to see how it would have performed.
Simple Historical Replay
from dataclasses import dataclass
@dataclass
class BacktestTrade:
market_id: str
direction: str # "yes" or "no"
entry_price: float
exit_price: float # 1.0 if correct, 0.0 if wrong
size: float
profit: float
def backtest_strategy(historical_markets: list[dict],
strategy_fn, initial_bankroll: float = 1000.0) -> dict:
"""Run a strategy against historical market data.
Args:
historical_markets: List of resolved markets with price history
strategy_fn: Function that takes market data and returns a trade decision
initial_bankroll: Starting capital
Returns:
Performance summary.
"""
bankroll = initial_bankroll
trades: list[BacktestTrade] = []
for market in historical_markets:
decision = strategy_fn(market)
if decision["direction"] == "none":
continue
# Size the position (quarter-Kelly for safety)
fraction = kelly_fraction(
decision["probability"], market["entry_price"]
) * 0.25
size = bankroll * fraction
if size < 1.0: # minimum trade size
continue
# Determine outcome
correct = (
(decision["direction"] == "yes" and market["resolved_yes"]) or
(decision["direction"] == "no" and not market["resolved_yes"])
)
entry = market["entry_price"] if decision["direction"] == "yes" \
else 1 - market["entry_price"]
profit = (1.0 - entry) * size if correct else -entry * size
trades.append(BacktestTrade(
market_id=market["id"],
direction=decision["direction"],
entry_price=entry,
exit_price=1.0 if correct else 0.0,
size=size,
profit=profit
))
bankroll += profit
# Calculate metrics
if not trades:
return {"trades": 0, "message": "No trades generated"}
wins = sum(1 for t in trades if t.profit > 0)
total_profit = sum(t.profit for t in trades)
max_drawdown = _calculate_max_drawdown(trades, initial_bankroll)
return {
"trades": len(trades),
"wins": wins,
"win_rate": wins / len(trades),
"total_profit": round(total_profit, 2),
"return_pct": round(total_profit / initial_bankroll * 100, 2),
"max_drawdown_pct": round(max_drawdown * 100, 2),
"final_bankroll": round(bankroll, 2),
}
def _calculate_max_drawdown(trades: list[BacktestTrade],
initial: float) -> float:
peak = initial
max_dd = 0.0
current = initial
for t in trades:
current += t.profit
peak = max(peak, current)
dd = (peak - current) / peak
max_dd = max(max_dd, dd)
return max_dd
Forward-Testing
Kalshi provides a demo environment where you can test with paper money. Use it to validate your strategy in real market conditions before risking capital. Polymarket does not have a demo mode, so use historical data for Polymarket backtesting.
For a comprehensive backtesting framework including overfitting prevention, proper train/test splits, and performance metrics, see the Backtesting and Strategy Validation Guide (coming soon).
Putting It All Together
Here is a complete agent loop that combines LLM analysis, sentiment, Bayesian updating, signal aggregation, and edge detection:
import asyncio
async def agent_loop(markets: list[dict], interval_seconds: int = 300):
"""Main agent decision loop. Runs continuously."""
while True:
for market in markets:
try:
# 1. OBSERVE: Gather signals
twitter = await fetch_twitter_signals(market["question"])
reddit = await fetch_reddit_signals(
"predictionmarkets", market["question"])
news = await fetch_news_signals(market["question"])
# 2. ANALYZE: Run intelligence pipeline
# 2a. LLM analysis
llm = evaluate_market_structured(
market["question"], market["yes_price"],
market.get("description", "")
)
# 2b. Sentiment scoring
social_sentiment = await score_signals(
twitter + reddit, market["question"])
news_sentiment = await score_signals(news, market["question"])
# 2c. Bayesian update from evidence
evidence = []
for signal in (twitter + reddit + news)[:10]:
lh = estimate_likelihood(signal.text, market["question"])
evidence.append(lh)
bayesian_prob = bayesian_update(market["yes_price"], evidence)
# 2d. Aggregate all signals
agg = SignalAggregator()
agg.add("llm", llm.probability, llm.confidence / 10, weight=3.0)
agg.add("social", (social_sentiment + 1) / 2, 0.5, weight=1.0)
agg.add("news", (news_sentiment + 1) / 2, 0.6, weight=1.5)
agg.add("bayesian", bayesian_prob, 0.7, weight=2.0)
result = agg.aggregate()
# 3. DECIDE: Edge detection
edge = calculate_edge(
result["probability"], market["yes_price"],
result["confidence"]
)
if edge["should_bet"]:
size = kelly_fraction(
result["probability"], market["yes_price"]
) * 0.25 # quarter-Kelly
print(f"TRADE SIGNAL: {market['question']}")
print(f" Direction: {edge['direction']}")
print(f" Edge: {edge['net_edge_after_fees']:.1%}")
print(f" Size: {size:.1%} of bankroll")
# 4. EXECUTE: Send to Layer 3
# await execute_trade(market, edge["direction"], size)
except Exception as e:
print(f"Error analyzing {market.get('question', 'unknown')}: {e}")
continue
await asyncio.sleep(interval_seconds)
The execute_trade call is commented out — uncomment it only after thorough backtesting and paper trading.
Common Pitfalls
Overfitting to recent data. A strategy that would have crushed the last 10 markets may fail on the next 10. Always use out-of-sample testing and be suspicious of strategies with win rates above 80%.
Ignoring fees in edge calculations. Polymarket charges roughly 2% in fees, Kalshi up to 7%. A 5% raw edge on Kalshi is actually negative after fees. Always calculate net edge.
LLM hallucination in probability estimates. LLMs can produce confident-sounding analysis with made-up facts. Always combine LLM output with data from verifiable sources (APIs, price feeds). Never rely on a single LLM call for a trading decision.
Not accounting for LLM knowledge cutoffs. LLMs don’t know what happened yesterday. The research → analyze → decide pattern in this guide solves this by feeding real-time data into the analysis stage.
Prompt injection risks. If your agent ingests text from social media or user-generated content, malicious actors could craft posts designed to manipulate your LLM’s analysis. Use input sanitization and see the Security Best Practices Guide for mitigation strategies.
Betting too large. Full Kelly sizing leads to large drawdowns. Start with quarter-Kelly and increase only after you have statistical evidence that your edge is real (100+ trades minimum).
Where This Fits in the Agent Betting Stack
This guide covers Layer 4 (Intelligence) of the Agent Betting Stack. It assumes you have the other layers in place:
- Layer 1 — Identity: Moltbook Identity Guide — register your agent and build reputation
- Layer 2 — Wallet: Wallet Comparison Guide — fund your agent with the right wallet
- Layer 3 — Trading: Polymarket API Guide | Kalshi API Guide — execute trades
- Cross-cutting: Security Best Practices — protect keys, prevent prompt injection
Related Tools
- OpenClaw — Agent framework with skill system and memory
- Polyseer — Multi-agent Bayesian analysis platform
- Predly — AI-powered mispricing detection
- Dome — Unified prediction market data API
Official Resources
- Anthropic API Documentation — Claude models used in this guide
- Agent Betting Glossary — Every term defined
- Tool Directory — All tools in the ecosystem