Every prediction market bot seller has a performance chart that goes up and to the right. The question is whether it is real.
The agent marketplace is growing fast. Developers are selling bots on GitHub, through Discord channels, and on purpose-built marketplaces. Some of these agents are genuinely valuable — proven strategies with documented edge, rigorous risk management, and months of live trading data. Others are backtests overfit to historical data, packaged with slick equity curves and sold to buyers who do not know how to tell the difference.
This guide is your defense. It covers, in detail, how to verify a prediction market bot’s performance claims before you spend a dollar. Not the marketing claims. Not the testimonials. The actual data — what it means, how to check it, and what should make you walk away.
If you are evaluating a bot to buy or rent, this is the verification layer you need before making a decision.
The Verification Hierarchy
Not all performance evidence is created equal. There is a clear hierarchy of trustworthiness, and understanding it saves you from the most common buyer mistakes.
TRUST LEVEL (lowest to highest)
┌─────────────────────────────────────────────────────┐
│ Level 5 — ON-CHAIN VERIFIED LIVE TRADING │
│ Wallet address on Polygon with full trade history │
│ Independently verifiable. Cannot be fabricated. │
├─────────────────────────────────────────────────────┤
│ Level 4 — API-VERIFIED LIVE TRADING │
│ Kalshi API trade logs with timestamps │
│ Harder to fake, but not immutable. │
├─────────────────────────────────────────────────────┤
│ Level 3 — LIVE TRADING (SELF-REPORTED) │
│ Screenshots, spreadsheets, dashboard exports │
│ Easily fabricated. Requires trust in seller. │
├─────────────────────────────────────────────────────┤
│ Level 2 — PAPER TRADING │
│ Simulated trades against live market data │
│ No real money at risk. Execution not tested. │
├─────────────────────────────────────────────────────┤
│ Level 1 — BACKTESTING │
│ Historical simulation against past data │
│ Useful for strategy logic. Easily overfit. │
└─────────────────────────────────────────────────────┘
The rule is simple: demand the highest level of evidence the platform supports. For Polymarket bots, that means on-chain verification. For Kalshi bots, that means API trade logs. Never accept backtests alone as proof that a bot works.
A seller who refuses to provide verifiable data — who only offers backtests or self-reported screenshots — is either hiding poor performance or has not traded with real money. Either way, you should not be their first live test.
Backtesting: Useful but Dangerous
Backtests are how most bot sellers present performance. They simulate the bot’s strategy against historical market data and produce a set of results that show what would have happened. This is useful for understanding the strategy’s logic and identifying obvious flaws. It is dangerous because it is trivially easy to produce a backtest that looks extraordinary but predicts nothing about future performance.
What Backtests Can Tell You
A well-constructed backtest reveals the strategy’s mechanics. You can see how the bot identifies trading opportunities, how it sizes positions, how it handles losing streaks, and what market conditions favor or hurt it. If a backtest shows the bot only trades during high-volatility election markets and you want something that works year-round, that is valuable information.
Backtests also serve as a sanity check. If a seller claims their bot generates 5% monthly returns but the backtest shows 50% monthly returns, the numbers do not add up and something is being misrepresented.
Common Backtest Failures
Every buyer should understand these four failure modes. They are how bad backtests become convincing sales tools.
Overfitting. The most common and most dangerous problem. A strategy with many tunable parameters can be fit to any historical dataset. The bot developer tweaks thresholds, look-back windows, and position sizes until the backtest produces attractive returns — but those parameters are tuned to the past, not predictive of the future. The more parameters a strategy has, the more suspicious you should be of its backtest results.
A quick overfitting test: ask the seller how many parameters the strategy has and how they were selected. If the answer involves optimization over the same data used for the backtest, the results are almost certainly overfit.
Look-ahead bias. This happens when the backtest uses information that would not have been available at trade time. Common examples in prediction markets: using the final resolution time of an event when the bot would have only known the estimated resolution date, or using news sentiment scores that were calculated after the market had already moved. Look-ahead bias inflates returns because the bot effectively trades with future knowledge.
Survivorship bias. The seller tested 30 strategy variations and shows you the one that produced the best backtest. You are seeing the survivor, not the 29 failures. This is not necessarily intentional deception — it is how most strategy development works. But it means the shown backtest overstates expected performance because it was selected for being the best result, not a typical one.
Unrealistic execution assumptions. Backtests commonly assume that every order fills at the displayed price with zero slippage. In real prediction markets, especially thin ones, your order moves the price. A bot that backtests well on a market with $500 daily volume will not achieve the same fills in practice because its own trades will push prices against it.
Minimum Backtest Standards
If a seller provides backtests, here is the minimum you should require. Anything less and the backtest is not useful for evaluation.
| Requirement | What to Check | Why It Matters |
|---|---|---|
| 6+ months of data | Does the backtest cover multiple market regimes? | Short backtests can look good by luck alone |
| Out-of-sample testing | Was the strategy developed on data separate from the test data? | In-sample testing guarantees overfitting |
| Realistic fee model | Are platform fees, gas costs, and spread included? | Fees eat 1-3% per trade on thin markets |
| Slippage model | Does the backtest account for market impact? | Ignoring slippage inflates returns by 10-30% on small markets |
| Walk-forward analysis | Was the strategy re-optimized periodically and tested on subsequent data? | Single-period optimization guarantees overfitting |
| Multiple market types | Was the strategy tested across politics, crypto, sports, economics? | Single-event backtests tell you nothing about generalization |
| Drawdown data included | Does the backtest show max drawdown and drawdown periods? | Hiding drawdowns is the most common deception |
| Parameter count disclosed | How many tunable parameters does the strategy have? | More parameters = higher overfitting risk |
If the seller cannot or will not provide backtests meeting these standards, treat their performance claims as unverified marketing.
Live Track Record Evaluation
Live track records are where verification gets real. A bot that has traded with actual money, on actual markets, with verifiable transaction history, has passed a test that no backtest can replicate.
Polymarket: On-Chain Verification
Polymarket trades settle on the Polygon blockchain. This is a verification gift. If a seller gives you their wallet address, you can independently verify every single trade, every timestamp, and every dollar of profit or loss. On-chain data cannot be fabricated, backdated, or selectively edited.
Here is how to verify a Polymarket bot’s track record using the subgraph API:
import requests
import json
from datetime import datetime
def verify_polymarket_wallet(wallet_address: str) -> dict:
"""
Query Polymarket's subgraph for a wallet's complete trade history.
Returns trade count, volume, and P&L summary.
"""
# Polymarket uses a subgraph on The Graph protocol
subgraph_url = "https://api.thegraph.com/subgraphs/name/polymarket/polymarket-matic"
query = """
{
transactions(
where: { user: "%s" }
orderBy: timestamp
orderDirection: desc
first: 1000
) {
id
type
timestamp
market {
question
slug
}
tradeAmount
outcomeIndex
price
feeAmount
}
}
""" % wallet_address.lower()
response = requests.post(
subgraph_url,
json={"query": query}
)
data = response.json()
transactions = data.get("data", {}).get("transactions", [])
if not transactions:
return {
"wallet": wallet_address,
"status": "NO_TRADES_FOUND",
"warning": "This wallet has no recorded Polymarket trades"
}
# Calculate summary statistics
total_volume = sum(float(tx["tradeAmount"]) for tx in transactions)
trade_count = len(transactions)
# Get date range
timestamps = [int(tx["timestamp"]) for tx in transactions]
first_trade = datetime.fromtimestamp(min(timestamps))
last_trade = datetime.fromtimestamp(max(timestamps))
trading_days = (last_trade - first_trade).days
return {
"wallet": wallet_address,
"total_trades": trade_count,
"total_volume_usdc": round(total_volume, 2),
"first_trade": first_trade.isoformat(),
"last_trade": last_trade.isoformat(),
"trading_days": trading_days,
"transactions": transactions # Full trade log for detailed analysis
}
# Usage
result = verify_polymarket_wallet("0xYOUR_SELLER_WALLET_ADDRESS")
print(json.dumps(result, indent=2))
You can also verify directly using a block explorer. Go to Polygonscan, paste the wallet address, and look at the token transfers. Every Polymarket trade shows up as a USDC transfer and a conditional token transfer. The blockchain does not lie.
What to check in on-chain data:
- Trade timestamps. Do they match the seller’s claimed operating period? A seller claiming six months of live trading should have six months of on-chain activity.
- Trade frequency. Does the volume of trades match what the strategy would logically produce? An arbitrage bot should have many small trades. A sentiment bot should have fewer, larger trades.
- P&L consistency. Calculate the actual profit and loss from the trade data. Does it match the seller’s claimed returns? Discrepancies of more than 5% are a red flag.
- Market diversity. Is the bot trading across multiple markets, or are all profits from a single lucky bet?
Kalshi: API-Based Verification
Kalshi operates off-chain, so there is no blockchain to inspect. Verification relies on the seller providing API-exportable trade logs directly from the Kalshi platform.
import requests
from datetime import datetime, timedelta
def request_kalshi_trade_log(api_key: str, start_date: str, end_date: str) -> dict:
"""
Pull trade history from Kalshi API for verification.
The seller should run this and share the output,
or grant read-only API access for independent verification.
"""
base_url = "https://trading-api.kalshi.com/trade-api/v2"
headers = {
"Authorization": f"Bearer {api_key}",
"Content-Type": "application/json"
}
# Fetch fills (executed trades)
params = {
"min_ts": int(datetime.fromisoformat(start_date).timestamp()),
"max_ts": int(datetime.fromisoformat(end_date).timestamp()),
"limit": 1000
}
response = requests.get(
f"{base_url}/portfolio/fills",
headers=headers,
params=params
)
fills = response.json().get("fills", [])
# Summarize
total_trades = len(fills)
total_pnl = sum(f.get("realized_pnl", 0) for f in fills)
total_fees = sum(f.get("fee", 0) for f in fills)
return {
"total_trades": total_trades,
"total_pnl_cents": total_pnl,
"total_fees_cents": total_fees,
"net_pnl_cents": total_pnl - total_fees,
"fills": fills # Full trade log
}
Kalshi trade logs are less trustworthy than on-chain data because they can theoretically be manipulated before sharing. To mitigate this:
- Ask the seller to grant you read-only API access so you can pull the data yourself.
- Cross-reference trade timestamps against known market events. If the seller claims a profit on a political market that resolved on March 15, verify that trades actually occurred before that resolution.
- Check that the fill prices are realistic. If the bot claims to have bought YES at $0.30 on a market where the price never dropped below $0.50, the data is fabricated.
Minimum Live Track Record Requirements
Regardless of platform, demand these minimums before taking any live track record seriously:
- 3 months of trading — anything shorter is statistically meaningless in prediction markets where individual bets can take weeks to resolve
- 100+ trades — you need enough data points for statistical significance; 20 trades could be luck
- Multiple market conditions — performance during a high-volatility election cycle tells you nothing about performance during quiet periods; demand data spanning different regimes
- Drawdown periods visible — a track record that shows only upward movement is either fake or has been cropped to hide losses
{{ partial “marketplace-cta.html” . }}
Key Performance Metrics
Numbers without context are meaningless. A 70% win rate sounds impressive until you learn the average loss is three times the average win. Here are the metrics that matter, what they actually measure, and what ranges indicate a legitimate, well-performing bot.
| Metric | What It Measures | Good | Acceptable | Red Flag |
|---|---|---|---|---|
| Sharpe Ratio | Return per unit of risk (volatility-adjusted) | > 2.0 | 1.0 - 2.0 | < 0.5 or unreported |
| Sortino Ratio | Return per unit of downside risk | > 2.5 | 1.0 - 2.5 | < 0.5 or unreported |
| Max Drawdown | Largest peak-to-trough capital decline | < 10% | 10% - 20% | > 30% or unreported |
| Win Rate | Percentage of trades that are profitable | 55% - 70% (directional), 75%+ (arb) | 45% - 55% with high avg win/loss ratio | < 40% without justification |
| Profit Factor | Gross profit / gross loss | > 2.0 | 1.5 - 2.0 | < 1.2 |
| Avg Trade Duration | Mean time a position is held | Consistent with claimed strategy | Slight mismatch | Completely inconsistent with strategy |
| Calmar Ratio | Annualized return / max drawdown | > 3.0 | 1.0 - 3.0 | < 0.5 |
| Trade Count | Number of trades in the evaluation period | 200+ | 100 - 200 | < 50 |
How to Read These Metrics Together
No single metric tells the full story. Here is how to combine them.
Sharpe Ratio is the single most useful metric because it adjusts returns for risk. A bot returning 5% monthly with 2% volatility (Sharpe around 2.5) is far better than a bot returning 15% monthly with 20% volatility (Sharpe around 0.75). If a seller reports only raw returns without a Sharpe ratio, they are likely hiding high volatility.
Sortino Ratio is a refinement of Sharpe that only penalizes downside volatility. This matters for prediction market bots because upside volatility (unexpectedly large wins) is not a problem. If the Sortino is significantly higher than the Sharpe, the bot’s volatility is mostly on the upside — a good sign.
Max Drawdown is your worst-case scenario metric. It tells you the maximum amount you would have lost from a peak in the bot’s equity curve to its subsequent trough. A 20% max drawdown means that at some point, your account was down 20% from its highest point. Ask yourself: could you stomach that loss without shutting the bot off? If the answer is no, the bot’s risk profile does not match your tolerance.
Win Rate and Profit Factor together tell the real story. A bot with a 90% win rate and a profit factor of 1.1 is making many small wins and occasional catastrophic losses. A bot with a 45% win rate and a profit factor of 2.5 is losing often but winning big when it does win. Both can be profitable — but the second is typically more robust because it does not depend on avoiding rare large losses.
Calmar Ratio combines returns and drawdowns into a single number. It answers the question: how much return am I getting per unit of maximum pain? A Calmar above 3.0 means the bot’s annualized return is at least three times its worst drawdown. Below 1.0 means you are suffering more drawdown than you are earning in returns.
Red Flags in Performance Presentations
Here is what should trigger immediate skepticism when evaluating a bot’s performance claims.
Cherry-Picked Time Periods
The most common deception. The seller shows you a three-month backtest window where the bot returned 40%, but conveniently omits the six months before and after where it lost money. Always ask: why this specific time period? If the answer is not “this is the full available dataset,” demand the complete history.
A specific tell: the backtest period starts or ends suspiciously close to a major market event (an election, a crypto crash, a surprise resolution). The bot probably performed well during that event and poorly otherwise.
Missing Drawdown Data
If the performance presentation includes a total return figure but no maximum drawdown, the seller is hiding something. Every strategy experiences drawdowns. A presentation without drawdown data is like a resume that lists achievements but refuses to explain gaps in employment.
Demand the full equity curve, not just the endpoint. The shape of the curve matters as much as the final number. A bot that made 30% but spent four months in a 25% drawdown before recovering is a very different product than one that made 30% with a smooth upward trajectory and 8% max drawdown.
Unrealistic Returns
Prediction markets have structural limits on returns. Arbitrage opportunities are typically small (1-5% per trade). Sentiment-driven edges decay as markets become more efficient. Market-making spreads are thin. If a bot claims consistent 30%+ monthly returns, one of the following is true:
- The results are fabricated
- The results are from a cherry-picked period
- The results are from a tiny sample size (a few trades that happened to go well)
- The bot is taking extreme risk that is not reflected in the reported metrics
For calibration: a strong prediction market bot producing 5-10% monthly returns consistently, with a Sharpe above 1.5 and max drawdown under 15%, is a genuinely excellent product. Anything claiming multiples of that should be treated as guilty until proven innocent.
No Live Data
A bot with backtests only — even excellent backtests — is unproven. Backtests and live performance diverge for reasons that cannot be simulated: real execution latency, real liquidity constraints, real API failures, real emotional pressure (the developer changing parameters during a drawdown).
If the seller says the bot is “ready to launch” but has never traded live, you are being asked to be the beta tester with your money. That might be fine at a steep discount. It is not fine at full price.
Smoothed Equity Curves
Real equity curves are jagged. They have drawdowns, flat periods, and sudden jumps. If the equity curve looks like a smooth upward line, it has been either fabricated or smoothed to hide volatility. Ask for daily or per-trade P&L data, not just a monthly summary. The granular data tells the truth that the summary obscures.
Returns Without Absolute Numbers
A bot that reports “127% return” without telling you it was on a $200 starting balance is technically accurate and completely misleading. Small accounts can achieve outsized percentage returns through a few lucky bets. Those returns do not scale.
Always ask: what was the starting capital, what was the ending capital, and what was the maximum capital deployed? A 50% return on $100,000 deployed over six months is a serious result. A 50% return on $500 over two weeks is noise.
Refusal to Share Wallet Address or Trade Logs
This is the biggest red flag of all. A legitimate Polymarket bot seller has a wallet address with a verifiable trade history. Sharing it costs them nothing and proves everything. If they refuse, the most likely explanation is that the wallet does not show what they claim it shows.
The same applies to Kalshi sellers who refuse to share API trade logs or grant read-only access. If the performance is real, there is no reason to hide the data.
Third-Party Audit Framework
As the prediction market agent marketplace matures, formal verification infrastructure is emerging. Here is what an audit covers, how to do it yourself, and when it is worth paying someone else.
What an Audit Covers
A thorough third-party audit of a prediction market bot evaluates:
- Code review — Is the strategy implemented correctly? Are there bugs in the position sizing, fee calculation, or order execution logic?
- Backtest reproduction — Does the seller’s claimed backtest match when the auditor runs the same code against the same data?
- Live performance verification — Does the on-chain or API-based trade history match the seller’s claims?
- Risk analysis — What are the tail risks? What happens during extreme market conditions? What is the maximum possible loss?
- Infrastructure review — Is the deployment secure? Are API keys handled properly? Does the bot fail gracefully?
DIY Verification Checklist
You do not need to hire an auditor for most purchases. Here is a step-by-step process you can follow yourself:
PRE-PURCHASE VERIFICATION CHECKLIST
[ ] 1. Request wallet address (Polymarket) or read-only API access (Kalshi)
[ ] 2. Independently verify trade history using on-chain data or API
[ ] 3. Calculate actual P&L from raw trade data — compare to seller's claims
[ ] 4. Check trade timestamps against known market events
[ ] 5. Verify that fill prices are within historical bid-ask spreads
[ ] 6. Calculate Sharpe ratio, max drawdown, and profit factor from raw data
[ ] 7. Compare backtest results to live results — significant gap = overfitting
[ ] 8. Check that trade count is sufficient for statistical significance (100+)
[ ] 9. Verify the bot traded through at least one drawdown period
[ ] 10. Ask the seller to explain the strategy at a conceptual level
[ ] 11. Request the number of tunable parameters and how they were selected
[ ] 12. Run a paper trade for 2-4 weeks before committing capital
When to Hire an Auditor
Pay for a professional audit when:
- The purchase price exceeds $5,000, or you plan to allocate more than $25,000 in capital to the bot
- The strategy is complex enough that you cannot evaluate the code yourself
- The seller provides source code and you want an independent review of its quality and correctness
- You are a fund or institution with fiduciary obligations
The cost of a professional bot audit ranges from $1,000 to $5,000 depending on complexity. That is cheap insurance on a five-figure capital deployment.
For verifying agent identity and reputation before purchasing, Moltbook’s identity layer provides cryptographically verifiable reputation scores. An agent with high karma, a long history, and a verified operator is substantially more trustworthy than an anonymous listing.
Verification in Practice: Step-by-Step Process
Here is the complete process for evaluating a prediction market bot, from first contact to capital deployment. Follow it in order. Skipping steps is how buyers lose money.
Step 1: Initial Screening (10 minutes)
Before deep analysis, eliminate obviously bad options. Check for:
- Does the listing include a strategy description, or is it vague buzzwords?
- Are any performance metrics reported (Sharpe, drawdown, win rate)?
- Is there a live track record, or only backtests?
- Does the seller have a public identity — a Moltbook profile, GitHub history, or community presence?
- Is the pricing reasonable for the claimed strategy type?
If the answer to any of these is no, move on. There are enough bots in the marketplace that you do not need to gamble on opaque listings.
Step 2: Data Collection (30 minutes)
Request the raw data from the seller:
- Wallet address for Polymarket bots
- Read-only API access or trade log export for Kalshi bots
- Full backtest results with methodology disclosure
- Parameter count and optimization methodology
- The complete equity curve including drawdown periods
A serious seller has this ready. If collecting this data requires extensive back-and-forth, it is a signal that the seller has not been through this process before — which means you are likely their first serious buyer.
Step 3: Independent Verification (1-2 hours)
Using the code examples from the live track record section above, independently verify:
- Total trade count matches the seller’s claim
- Total P&L matches within 5% (accounting for rounding and fee calculation differences)
- Trade timestamps are consistent with the claimed operating period
- Fill prices are realistic (within historical bid-ask spreads)
- Performance metrics (Sharpe, drawdown, profit factor) match when calculated from raw data
Document any discrepancies. Small rounding differences are normal. P&L that is off by 20% is a deal-breaker.
Step 4: Backtest vs. Live Comparison (30 minutes)
If the seller provides both backtests and live results, compare them. A gap between backtest and live performance is expected — backtests always look better than live trading. But the gap tells you something:
- Backtest 10% monthly, live 7% monthly — Normal decay from backtest to live. The bot probably works.
- Backtest 30% monthly, live 10% monthly — Significant overfitting in the backtest. The live results might be real, but the backtest is unreliable for projecting future performance.
- Backtest 20% monthly, live 3% monthly — The backtest was deeply overfit. The live results might also be inflated by favorable conditions.
- Backtest 15% monthly, live -5% monthly — The strategy does not work in production. Walk away.
Step 5: Paper Trading (2-4 weeks)
Before committing real capital, run the bot in paper-trading mode. This is non-negotiable. During the paper trading period, verify:
- The bot connects to platforms and executes as expected
- Trade frequency matches historical patterns
- Risk parameters are respected
- The bot handles API errors, rate limits, and edge cases gracefully
- Simulated returns are roughly consistent with the verified live track record
Our Polymarket rate limits guide covers the API constraints your bot will encounter during testing.
Step 6: Gradual Capital Deployment
After successful paper trading, deploy real capital in stages:
- Start with 10-20% of your planned allocation
- Run for two weeks; compare results to paper trading
- If consistent, increase to 50%
- After two more weeks, scale to full allocation
- Continue monitoring indefinitely — no bot is set-and-forget
For wallet setup and risk parameter configuration, the buyer’s guide covers the full onboarding process, and the wallet comparison helps you choose the right wallet with appropriate spending limits.
What’s Next
Verification is the most important step in the bot-buying process, but it is not the only step. Here are the logical next moves depending on where you are.
Ready to evaluate specific bots? Browse the AgentBets marketplace and apply this verification framework to actual listings. The tools directory covers the platforms and infrastructure referenced throughout this guide.
Need the full buyer’s guide? Our How to Buy a Prediction Market Agent guide covers everything beyond verification — pricing, licensing models, onboarding, and risk management.
Looking at the seller’s perspective? If you are building bots and want to understand what buyers expect, the How to Sell a Prediction Market Bot guide covers packaging, pricing, and how to present performance data that passes buyer scrutiny.
Want to understand the broader marketplace? The Prediction Market Agent Marketplace guide maps the entire ecosystem — who the players are, how commerce works, and where the category is heading.
Building your own instead? Sometimes the best outcome of a verification process is deciding to build rather than buy. The Polymarket Trading Bot Quickstart gets you from zero to a working bot in 30 minutes, and the Agent Betting Stack guide covers the full four-layer architecture.
Trust the data. Verify everything. Deploy gradually. That is how you buy a prediction market bot without getting burned.