Layer 4 — Intelligence

Market Manipulation Detection: Math for Identifying Artificial Price Movements

Q: "How do you detect market manipulation in prediction markets?"

"Four primary statistical tests detect manipulation in prediction markets: volume anomaly detection using rolling z-scores (threshold |z| \u003e 3), variance ratio tests checking the random walk hypothesis (VR(q) should equal 1.0 in efficient markets), Benford's law analysis on trade sizes (natural data follows P(d) = log10(1 + 1/d) for first digits), and graph analysis of counterparty relationships to find wash trading cycles. An agent runs these as pre-trade filters before entering any position."

Q: "What is wash trading in prediction markets and how is it detected?"

"Wash trading is when a single entity trades with itself to inflate volume and create false price signals. Detection uses graph analysis: construct a directed graph where nodes are wallet addresses and edges are trades. Short cycles (A → B → A or A → B → C → A) indicate self-dealing. On Polymarket, agents can pull trade history via the CLOB API and build counterparty graphs using NetworkX."

Q: "What is the variance ratio test for market efficiency?"

"The Lo-MacKinlay variance ratio test checks whether prices follow a random walk. VR(q) = Var(r_t(q)) / (q × Var(r_t)), where r_t(q) is the q-period return and r_t is the single-period return. In an efficient market, VR(q) = 1.0. Values significantly above 1.0 suggest positive serial correlation (momentum manipulation); values below 1.0 suggest mean-reversion (pump-and-dump cycles)."

Q: "How does Benford's law apply to detecting trading manipulation?"

"Benford's law states that in naturally occurring datasets, the first digit d appears with frequency P(d) = log10(1 + 1/d). The digit 1 appears ~30.1% of the time, not 11.1%. Artificial trade sizes — generated by bots using round numbers or uniform distributions — violate this pattern. A chi-squared test comparing observed first-digit frequencies against Benford's expected frequencies detects synthetic trading activity."

Q: "What was the 2024 Polymarket whale controversy?"

"In October 2024, a single trader (identified by the wallet 'Fredi9999') accumulated over $30M in YES positions on the US presidential election market on Polymarket, moving the YES price from roughly $0.50 to $0.66. Statistical analysis showed the price movement correlated almost entirely with this wallet's activity rather than new information. The episode demonstrated how concentrated capital can distort prediction market prices, especially in markets where total liquidity is limited relative to the position size."

By Rahim March 21, 2026March 31, 2026 · 18 min read

Statistical methods for detecting wash trading, spoofing, and artificial price movements in prediction markets. Martingale tests, Benford's law, variance ratio tests, and graph-based wash trading detection for autonomous agents.

Market Manipulation Detection: Math for Identifying Artificial Price Movements

Summary: Technical guide to detecting market manipulation in prediction markets using statistical and graph-theoretic methods, designed for autonomous betting agents. Covers the martingale property of efficient market prices — in a fair market, E[P(t+1) | P(t)] = P(t), meaning price changes are unpredictable. Defines four manipulation signatures: (1) abnormal volume spikes without corresponding information events, detected via rolling z-scores where z = (V(t) - μ_V) / σ_V with threshold |z| > 3; (2) price-volume divergence where large price moves occur on thin volume, measured by the price-volume correlation coefficient; (3) mean-reversion patterns (pump-and-dump) detected via autocorrelation of returns where manipulated markets show significant negative lag-1 autocorrelation; (4) order book spoofing where large limit orders cancel before execution, detected by tracking order-to-trade ratios. Implements Benford's law analysis for detecting artificial order patterns — the first-digit distribution of natural trade sizes follows P(d) = log10(1 + 1/d), and deviation measured by chi-squared test indicates synthetic activity. Covers the Wald-Wolfowitz runs test for non-random price sequences and the Lo-MacKinlay variance ratio test VR(q) = Var(r_t(q)) / (q × Var(r_t)) where departure from 1.0 indicates serial correlation inconsistent with efficient pricing. Implements wash trading detection using NetworkX graph analysis of trade counterparties — cycles in the transaction graph reveal self-dealing. Discusses the 2024 Polymarket whale controversy where a single trader accumulated $30M+ in YES positions on the US presidential election market. Python implementation includes a full ManipulationDetector class with methods for volume anomaly detection, runs testing, variance ratio computation, Benford analysis, and wash trade graph construction. Maps to Layer 4 (Intelligence) of the Agent Betting Stack — manipulation detection is a pre-trade filter that prevents agents from entering positions in compromised markets. References Polymarket CLOB API for orderbook data, Kalshi API for trade history, and the AgentBets Prediction Market API Reference for endpoint details. Topics: market manipulation detection, wash trading, spoofing, Benford's law, variance ratio test, martingale, runs test, prediction markets, Polymarket, Kalshi, autonomous agents, order book analysis, graph analysis.

Topics: market manipulation, wash trading, spoofing detection, Benford's law, variance ratio test, martingale, runs test, prediction markets, order book analysis, graph analysis, statistical testing

Stack layers: Layer 4 — Intelligence

Related tools: Polymarket CLOB, Kalshi API, py-clob-client, numpy, scipy, networkx

Efficient market prices follow a martingale — E[P(t+1) | P(t)] = P(t). When they don’t, something is wrong. Four statistical tests detect manipulation: volume z-scores, variance ratio tests, Benford’s law on trade sizes, and graph-cycle detection for wash trading. Run all four as pre-trade filters before your agent touches any market.

Why This Matters for Agents

An autonomous betting agent that enters a manipulated market is burning capital. The agent’s expected value calculations assume prices reflect genuine information aggregation. When a whale pumps a market by 15 points with no new information, the agent’s model sees “mispricing” that isn’t mispricing — it’s a trap. The agent buys, the whale dumps, and the agent eats the loss.

This is Layer 4 — Intelligence. Manipulation detection sits in the agent’s pre-trade filter pipeline, between price ingestion (Layer 3) and position decision (Layer 4). The data flows through the Prediction Market API Reference endpoints, gets screened by the detection module described here, and only clean signals pass through to the Kelly sizing and execution layer. Every false positive (flagging a legitimate move as manipulation) costs opportunity. Every false negative (missing real manipulation) costs capital. The math that follows calibrates this tradeoff.

The Math

The Martingale Property of Efficient Prices

In an informationally efficient market, prices follow a martingale process:

E[P(t+1) | F(t)] = P(t)

where P(t) is the price at time t and F(t) is the information set available at time t. The expected future price, given everything known now, equals the current price. Price changes are unpredictable — if they were predictable, traders would already have traded on that prediction.

This is the null hypothesis for every manipulation test. If price changes are predictable in specific patterns — mean-reverting after spikes, correlated with single-wallet activity, occurring on artificial volume — the martingale property is violated, and manipulation is the likely explanation.

Signature 1: Abnormal Volume Spikes

Legitimate information events (election results, earnings announcements, court rulings) produce correlated volume-and-price moves. Manipulation produces volume spikes without corresponding information.

The detection metric is a rolling z-score on volume:

z(t) = (V(t) - μ_V(t)) / σ_V(t)

where V(t) is volume at time t, μ_V(t) is the rolling mean volume over a lookback window (typically 24-72 hours), and σ_V(t) is the rolling standard deviation. A z-score exceeding 3.0 flags the period as anomalous.

The critical refinement: pair volume anomalies with an information calendar. If z > 3 coincides with a known news event (debate, primary, data release), it’s likely legitimate. If z > 3 occurs at 3 AM with no scheduled information catalyst, the probability of manipulation rises sharply.

Signature 2: Price-Volume Divergence

In healthy markets, large price moves require large volume — it takes capital to move prices. The Amihud illiquidity ratio captures this:

ILLIQ(t) = |r(t)| / V(t)

where r(t) = (P(t) - P(t-1)) / P(t-1) is the return and V(t) is dollar volume. A spike in ILLIQ means the price moved significantly on thin volume — exactly what happens when a manipulator trades in an illiquid market to generate maximum price impact per dollar spent.

For prediction markets specifically, track the ratio of price change to orderbook depth consumed:

Impact Ratio = |ΔP| / Depth_Consumed

where Depth_Consumed is the total dollar value of resting orders filled during the price move. High impact ratios on Polymarket’s CLOB suggest the move consumed thin liquidity rather than representing broad market consensus.

Signature 3: Mean-Reversion Patterns (Pump and Dump)

Manipulated prices exhibit a distinctive pattern: rapid unidirectional movement followed by gradual reversion to the pre-manipulation level. This creates negative autocorrelation in returns at short lags.

The lag-1 autocorrelation of returns:

ρ(1) = Cov(r(t), r(t-1)) / Var(r(t))

In an efficient market, ρ(1) ≈ 0 (returns are uncorrelated). Significantly negative ρ(1) indicates mean-reversion — prices consistently reverse recent moves. For prediction markets sampled at hourly or 15-minute intervals, ρ(1) < -0.15 with statistical significance (p < 0.05) warrants investigation.

Signature 4: Order Book Spoofing

Spoofing is placing large limit orders with no intention of execution, creating false impressions of supply or demand, then canceling before the orders fill. The detection metric is the order-to-trade ratio (OTR):

OTR = Orders_Placed / Orders_Executed

Legitimate market makers have OTRs between 3:1 and 10:1 (some orders adjust as conditions change). Spoofers have OTRs exceeding 50:1 — they place and cancel at extreme rates. On Polymarket’s CLOB, agents can monitor the WebSocket feed for order placement and cancellation events to compute per-address OTRs.

A complementary metric is the order lifespan distribution. Legitimate orders persist for minutes to hours. Spoofed orders live for seconds. If the median order lifespan at the top-of-book drops below 5 seconds during a price move, the move is likely spoofing-driven.

Benford’s Law for Trade Size Analysis

Benford’s law states that in naturally occurring numerical datasets, the first digit d (for d = 1, 2, …, 9) appears with probability:

P(d) = log₁₀(1 + 1/d)

Digit	Expected Frequency
1	30.1%
2	17.6%
3	12.5%
4	9.7%
5	7.9%
6	6.7%
7	5.8%
8	5.1%
9	4.6%

Natural trade sizes (reflecting real economic decisions with varying order magnitudes) follow this distribution. Bot-generated trade sizes — especially those using round numbers ($100, $500, $1000) or uniform random sizes — deviate measurably.

The test statistic is chi-squared:

χ² = Σ (O(d) - E(d))² / E(d)

where O(d) is the observed count of trades with first digit d and E(d) = N × P(d) is the expected count. With 8 degrees of freedom, χ² > 15.51 rejects the null (Benford-conforming) at α = 0.05.

Variance Ratio Test

The Lo-MacKinlay variance ratio test directly checks the random walk hypothesis. If prices follow a random walk, the variance of q-period returns should scale linearly with q:

VR(q) = Var(r_t(q)) / (q × Var(r_t))

where r_t(q) = ln(P(t)) - ln(P(t-q)) is the q-period log return and r_t = ln(P(t)) - ln(P(t-1)) is the single-period log return.

Under the random walk null hypothesis, VR(q) = 1.0. The test statistic (heteroskedasticity-robust):

z*(q) = (VR(q) - 1) / √(φ*(q))

where φ*(q) is the heteroskedasticity-consistent variance estimator. Under the null, z* is asymptotically standard normal.

Interpretation for manipulation detection:

VR(q) significantly > 1.0: Positive serial correlation. Prices trend — consistent with momentum-based manipulation (coordinated buying/selling pressure).
VR(q) significantly < 1.0: Negative serial correlation. Prices mean-revert — consistent with pump-and-dump manipulation.

Test at multiple horizons: q = 2, 5, 10, 20 (for hourly data, this covers 2 hours to ~1 day).

Wash Trading Detection via Graph Analysis

Wash trading — a single entity trading with itself through multiple wallets — is the most common manipulation in crypto-native prediction markets. Detection uses transaction graph analysis.

Construct a directed graph G = (V, E) where:

V = set of wallet addresses that traded in the market
E = set of directed edges (buyer → seller) for each filled trade

Wash trading creates short cycles in this graph. If wallet A sells to wallet B, and wallet B sells back to wallet A, that’s a 2-cycle. In practice, manipulators use chains: A → B → C → D → A (4-cycle) to obscure the pattern.

Detection algorithm:

Build the transaction graph from trade history
Find all strongly connected components (SCCs) — subsets of nodes where every node is reachable from every other node
For each SCC, compute the cycle ratio: (edges within SCC) / (total edges involving SCC nodes)
SCCs with cycle ratio > 0.5 and total volume > threshold are flagged as wash trading clusters

The volume concentration metric adds a second filter:

HHI_volume = Σ (V_i / V_total)²

where V_i is wallet i’s volume and V_total is total market volume. HHI > 0.25 (equivalent to fewer than 4 equal-sized participants) in a market that should have broad participation signals concentrated activity — potentially wash trading.

Worked Examples

Example 1: Detecting the Polymarket Whale (2024 US Election)

In October 2024, a single Polymarket trader using the handle “Fredi9999” accumulated over $30M in YES positions on “Will Donald Trump win the 2024 presidential election?” The YES price moved from approximately $0.50 to $0.66 over several weeks, with the price movement tightly correlated to this wallet’s buying activity.

Applying the detection metrics:

Volume z-score during accumulation period:
  Rolling 72h mean volume:  $2.1M/day
  Peak day volume:          $11.4M
  z-score:                  (11.4 - 2.1) / 1.8 = 5.17  → FLAGGED

Amihud illiquidity ratio:
  |ΔP| during peak buying:  0.08 (8 cents)
  Volume during move:       $11.4M
  ILLIQ = 0.08 / 11.4M = 7.0 × 10⁻⁹
  (Low ILLIQ — the move required substantial capital, not thin-market manipulation)

Volume concentration (HHI):
  Fredi9999 share:          ~40% of total volume
  HHI estimate:             0.40² + remainder ≈ 0.19 → borderline

Variance ratio VR(5) during accumulation:
  VR(5) = 1.38  → z* = 2.41  → FLAGGED (positive serial correlation)

The analysis reveals an important nuance: the volume z-score and variance ratio both flag anomalies, but the low Amihud ratio shows the trader used real capital. This was not thin-market manipulation — it was concentrated positioning by a well-capitalized actor. Whether this constitutes “manipulation” or “informed trading” is debatable, but the agent’s filter should flag it regardless. An agent entering the opposite side of a $30M position needs to know the counterparty concentration.

Example 2: Wash Trading Detection on a Kalshi Market

Consider a Kalshi market on “Will US GDP growth exceed 3% in Q1 2026?” with the following trade graph among wallets A through F:

Trade History:
  A sells 500 YES to B at $0.45
  B sells 480 YES to C at $0.46
  C sells 490 YES to A at $0.44
  D buys 200 YES from E at $0.45
  E buys 150 YES from F at $0.46

Graph analysis:
  SCC found: {A, B, C}
  Cycle ratio: 3 edges in cycle / 3 total edges involving {A,B,C} = 1.0
  Volume in cycle: 500 + 480 + 490 = 1,470 contracts
  Total market volume: 1,470 + 200 + 150 = 1,820 contracts
  Wash volume ratio: 1,470 / 1,820 = 80.8% → FLAGGED

  Volume HHI: dominated by 3 wallets with circular flow
  Agent recommendation: DO NOT TRADE — 81% of volume is wash

The D-E-F trades form a tree (no cycles) and appear legitimate. The agent should discount the market’s implied probability because 81% of price-discovery volume is artificial.

Implementation

import numpy as np
from scipy import stats
from collections import defaultdict
from dataclasses import dataclass, field


@dataclass
class ManipulationReport:
    """Output of manipulation detection pipeline."""
    market_id: str
    volume_zscore: float
    volume_flag: bool
    amihud_ratio: float
    autocorrelation_lag1: float
    reversion_flag: bool
    variance_ratio: dict[int, float] = field(default_factory=dict)
    vr_flag: bool = False
    benford_chi2: float = 0.0
    benford_pvalue: float = 1.0
    benford_flag: bool = False
    wash_trade_clusters: int = 0
    wash_volume_ratio: float = 0.0
    wash_flag: bool = False
    overall_risk: str = "LOW"


class ManipulationDetector:
    """
    Pre-trade manipulation detection module for prediction market agents.
    Sits between price ingestion (Layer 3) and decision engine (Layer 4).
    """

    def __init__(
        self,
        volume_zscore_threshold: float = 3.0,
        autocorr_threshold: float = -0.15,
        benford_alpha: float = 0.05,
        wash_cycle_ratio_threshold: float = 0.5,
        vr_significance: float = 1.96
    ):
        self.volume_zscore_threshold = volume_zscore_threshold
        self.autocorr_threshold = autocorr_threshold
        self.benford_alpha = benford_alpha
        self.wash_cycle_ratio_threshold = wash_cycle_ratio_threshold
        self.vr_significance = vr_significance

    def volume_anomaly(
        self,
        volumes: np.ndarray,
        lookback: int = 72
    ) -> tuple[float, bool]:
        """
        Compute rolling z-score of current volume vs lookback window.

        Args:
            volumes: Array of volume observations (e.g., hourly).
            lookback: Number of periods for rolling window.

        Returns:
            (z_score, is_anomalous)
        """
        if len(volumes) < lookback + 1:
            return 0.0, False

        window = volumes[-(lookback + 1):-1]
        current = volumes[-1]
        mu = np.mean(window)
        sigma = np.std(window, ddof=1)

        if sigma == 0:
            return 0.0, False

        z = (current - mu) / sigma
        return float(z), abs(z) > self.volume_zscore_threshold

    def amihud_illiquidity(
        self,
        prices: np.ndarray,
        volumes: np.ndarray
    ) -> np.ndarray:
        """
        Compute Amihud illiquidity ratio: |return| / dollar_volume.

        Args:
            prices: Array of prices.
            volumes: Array of dollar volumes (same length as prices).

        Returns:
            Array of illiquidity ratios (length = len(prices) - 1).
        """
        returns = np.abs(np.diff(prices) / prices[:-1])
        vol = volumes[1:]
        vol = np.where(vol == 0, np.nan, vol)
        return returns / vol

    def return_autocorrelation(
        self,
        prices: np.ndarray,
        lag: int = 1
    ) -> tuple[float, bool]:
        """
        Compute autocorrelation of returns at specified lag.

        Args:
            prices: Array of prices.
            lag: Autocorrelation lag (default 1).

        Returns:
            (autocorrelation, is_mean_reverting)
        """
        returns = np.diff(np.log(prices))
        if len(returns) < lag + 2:
            return 0.0, False

        r1 = returns[lag:]
        r2 = returns[:-lag]
        corr = np.corrcoef(r1, r2)[0, 1]

        return float(corr), corr < self.autocorr_threshold

    def variance_ratio_test(
        self,
        prices: np.ndarray,
        periods: list[int] = None
    ) -> dict[int, tuple[float, float, bool]]:
        """
        Lo-MacKinlay variance ratio test at multiple horizons.

        Args:
            prices: Array of prices.
            periods: List of holding periods q to test.

        Returns:
            Dict of q -> (VR(q), z_statistic, is_significant).
        """
        if periods is None:
            periods = [2, 5, 10, 20]

        log_prices = np.log(prices)
        returns_1 = np.diff(log_prices)
        n = len(returns_1)
        var_1 = np.var(returns_1, ddof=1)
        results = {}

        for q in periods:
            if n < 2 * q:
                continue

            returns_q = log_prices[q:] - log_prices[:-q]
            var_q = np.var(returns_q, ddof=1)
            vr = var_q / (q * var_1) if var_1 > 0 else 1.0

            # Heteroskedasticity-robust standard error
            # (simplified — uses asymptotic variance under heteroskedasticity)
            theta = 0.0
            for j in range(1, q):
                delta_j = 0.0
                for t in range(j, n):
                    delta_j += (returns_1[t] ** 2) * (returns_1[t - j] ** 2)
                mu_sq = np.mean(returns_1 ** 2) ** 2
                if mu_sq > 0:
                    delta_j = (delta_j / n) / mu_sq - 1
                    theta += ((2 * (q - j)) / q) ** 2 * delta_j

            se = np.sqrt(theta) if theta > 0 else 1.0
            z_star = (vr - 1) / se if se > 0 else 0.0
            is_sig = abs(z_star) > self.vr_significance

            results[q] = (float(vr), float(z_star), is_sig)

        return results

    def benford_test(
        self,
        trade_sizes: np.ndarray
    ) -> tuple[float, float, bool]:
        """
        Benford's law first-digit test on trade sizes.

        Args:
            trade_sizes: Array of trade sizes (positive numbers).

        Returns:
            (chi2_statistic, p_value, rejects_benford)
        """
        trade_sizes = trade_sizes[trade_sizes > 0]
        if len(trade_sizes) < 50:
            return 0.0, 1.0, False

        first_digits = np.array([
            int(str(abs(x)).lstrip('0').replace('.', '')[0])
            for x in trade_sizes if x > 0
        ])
        first_digits = first_digits[first_digits > 0]

        n = len(first_digits)
        observed = np.zeros(9)
        for d in first_digits:
            if 1 <= d <= 9:
                observed[d - 1] += 1

        expected = np.array([
            n * np.log10(1 + 1 / d) for d in range(1, 10)
        ])

        # Avoid division by zero
        mask = expected > 0
        chi2 = np.sum((observed[mask] - expected[mask]) ** 2 / expected[mask])
        p_value = 1 - stats.chi2.cdf(chi2, df=8)

        return float(chi2), float(p_value), p_value < self.benford_alpha

    def detect_wash_cycles(
        self,
        trades: list[tuple[str, str, float]]
    ) -> tuple[int, float]:
        """
        Detect wash trading cycles in transaction graph.

        Args:
            trades: List of (buyer_address, seller_address, volume) tuples.

        Returns:
            (num_wash_clusters, wash_volume_ratio)
        """
        # Build adjacency list
        graph: dict[str, set[str]] = defaultdict(set)
        volume_by_edge: dict[tuple[str, str], float] = defaultdict(float)
        total_volume = 0.0

        for buyer, seller, vol in trades:
            graph[buyer].add(seller)
            volume_by_edge[(buyer, seller)] += vol
            total_volume += vol

        # Find strongly connected components using Tarjan's algorithm
        # (simplified iterative version)
        index_counter = [0]
        stack: list[str] = []
        lowlink: dict[str, int] = {}
        index: dict[str, int] = {}
        on_stack: dict[str, bool] = {}
        sccs: list[set[str]] = []
        all_nodes = set(graph.keys())
        for targets in graph.values():
            all_nodes.update(targets)

        def strongconnect(v: str) -> None:
            index[v] = index_counter[0]
            lowlink[v] = index_counter[0]
            index_counter[0] += 1
            stack.append(v)
            on_stack[v] = True

            for w in graph.get(v, set()):
                if w not in index:
                    strongconnect(w)
                    lowlink[v] = min(lowlink[v], lowlink[w])
                elif on_stack.get(w, False):
                    lowlink[v] = min(lowlink[v], index[w])

            if lowlink[v] == index[v]:
                scc: set[str] = set()
                while True:
                    w = stack.pop()
                    on_stack[w] = False
                    scc.add(w)
                    if w == v:
                        break
                if len(scc) > 1:  # Only multi-node SCCs are wash candidates
                    sccs.append(scc)

        for node in all_nodes:
            if node not in index:
                strongconnect(node)

        # Calculate wash volume
        wash_clusters = 0
        wash_volume = 0.0

        for scc in sccs:
            scc_edges = 0
            scc_volume = 0.0
            total_scc_edges = 0

            for buyer, seller in volume_by_edge:
                if buyer in scc or seller in scc:
                    total_scc_edges += 1
                if buyer in scc and seller in scc:
                    scc_edges += 1
                    scc_volume += volume_by_edge[(buyer, seller)]

            if total_scc_edges > 0:
                cycle_ratio = scc_edges / total_scc_edges
                if cycle_ratio > self.wash_cycle_ratio_threshold:
                    wash_clusters += 1
                    wash_volume += scc_volume

        wash_ratio = wash_volume / total_volume if total_volume > 0 else 0.0
        return wash_clusters, float(wash_ratio)

    def runs_test(
        self,
        prices: np.ndarray
    ) -> tuple[float, float]:
        """
        Wald-Wolfowitz runs test for randomness of price direction.

        A 'run' is a consecutive sequence of same-direction moves.
        Too few runs → trending (momentum manipulation).
        Too many runs → mean-reverting (pump-and-dump).

        Args:
            prices: Array of prices.

        Returns:
            (z_statistic, p_value)
        """
        changes = np.diff(prices)
        signs = np.sign(changes)
        signs = signs[signs != 0]  # Remove zero changes

        if len(signs) < 20:
            return 0.0, 1.0

        n1 = np.sum(signs > 0)  # Number of positive changes
        n2 = np.sum(signs < 0)  # Number of negative changes
        n = n1 + n2

        # Count runs
        runs = 1
        for i in range(1, len(signs)):
            if signs[i] != signs[i - 1]:
                runs += 1

        # Expected runs and variance under null (random sequence)
        expected_runs = (2 * n1 * n2) / n + 1
        var_runs = (2 * n1 * n2 * (2 * n1 * n2 - n)) / (n ** 2 * (n - 1))

        if var_runs <= 0:
            return 0.0, 1.0

        z = (runs - expected_runs) / np.sqrt(var_runs)
        p_value = 2 * (1 - stats.norm.cdf(abs(z)))

        return float(z), float(p_value)

    def full_scan(
        self,
        market_id: str,
        prices: np.ndarray,
        volumes: np.ndarray,
        trade_sizes: np.ndarray,
        trades: list[tuple[str, str, float]] = None
    ) -> ManipulationReport:
        """
        Run complete manipulation detection pipeline on a market.

        Args:
            market_id: Market identifier (e.g., Polymarket condition_id).
            prices: Price time series.
            volumes: Volume time series (same length as prices).
            trade_sizes: Array of individual trade sizes.
            trades: Optional list of (buyer, seller, volume) for wash detection.

        Returns:
            ManipulationReport with all test results and overall risk.
        """
        vol_z, vol_flag = self.volume_anomaly(volumes)
        amihud = np.nanmean(self.amihud_illiquidity(prices, volumes))
        autocorr, rev_flag = self.return_autocorrelation(prices)
        vr_results = self.variance_ratio_test(prices)
        vr_flag = any(sig for _, _, sig in vr_results.values())
        ben_chi2, ben_p, ben_flag = self.benford_test(trade_sizes)

        wash_clusters = 0
        wash_ratio = 0.0
        wash_flag = False
        if trades:
            wash_clusters, wash_ratio = self.detect_wash_cycles(trades)
            wash_flag = wash_ratio > 0.3

        # Risk scoring: count flags
        flags = sum([vol_flag, rev_flag, vr_flag, ben_flag, wash_flag])
        if flags >= 3:
            risk = "HIGH"
        elif flags >= 2:
            risk = "MEDIUM"
        elif flags >= 1:
            risk = "LOW-MEDIUM"
        else:
            risk = "LOW"

        return ManipulationReport(
            market_id=market_id,
            volume_zscore=vol_z,
            volume_flag=vol_flag,
            amihud_ratio=float(amihud),
            autocorrelation_lag1=autocorr,
            reversion_flag=rev_flag,
            variance_ratio={q: vr for q, (vr, _, _) in vr_results.items()},
            vr_flag=vr_flag,
            benford_chi2=ben_chi2,
            benford_pvalue=ben_p,
            benford_flag=ben_flag,
            wash_trade_clusters=wash_clusters,
            wash_volume_ratio=wash_ratio,
            wash_flag=wash_flag,
            overall_risk=risk,
        )


# --- Demo: Run the detector on synthetic data ---
if __name__ == "__main__":
    np.random.seed(42)

    # Simulate a manipulated market: pump from 0.50 to 0.65, then revert
    n = 200
    prices_clean = 0.50 + np.cumsum(np.random.normal(0, 0.005, n))
    prices_clean = np.clip(prices_clean, 0.01, 0.99)

    # Inject manipulation: pump at t=100-120, dump at t=130-150
    prices_manip = prices_clean.copy()
    prices_manip[100:120] += np.linspace(0, 0.12, 20)
    prices_manip[130:150] -= np.linspace(0, 0.10, 20)
    prices_manip = np.clip(prices_manip, 0.01, 0.99)

    # Volume: normal ~1000, spike during manipulation
    volumes = np.random.lognormal(mean=7, sigma=0.5, size=n)
    volumes[100:120] *= 5  # 5x volume during pump

    # Trade sizes: mix of natural and bot-generated round numbers
    natural_sizes = np.random.lognormal(mean=4, sigma=1.5, size=500)
    bot_sizes = np.random.choice([100, 500, 1000, 5000], size=200)
    trade_sizes = np.concatenate([natural_sizes, bot_sizes])

    # Wash trades
    wash_trades = [
        ("0xAAA", "0xBBB", 5000),
        ("0xBBB", "0xCCC", 4800),
        ("0xCCC", "0xAAA", 4900),
        ("0xDDD", "0xEEE", 2000),
        ("0xFFF", "0xGGG", 1500),
        ("0xHHH", "0xIII", 800),
    ]

    detector = ManipulationDetector()
    report = detector.full_scan(
        market_id="polymarket-test-market",
        prices=prices_manip,
        volumes=volumes,
        trade_sizes=trade_sizes,
        trades=wash_trades,
    )

    print(f"Market: {report.market_id}")
    print(f"Overall Risk: {report.overall_risk}")
    print(f"\nVolume z-score:      {report.volume_zscore:+.2f}  {'FLAG' if report.volume_flag else 'OK'}")
    print(f"Amihud illiquidity:  {report.amihud_ratio:.2e}")
    print(f"Autocorrelation(1): {report.autocorrelation_lag1:+.3f}  {'FLAG' if report.reversion_flag else 'OK'}")
    print(f"Variance ratios:     {report.variance_ratio}  {'FLAG' if report.vr_flag else 'OK'}")
    print(f"Benford chi2:        {report.benford_chi2:.2f} (p={report.benford_pvalue:.4f})  {'FLAG' if report.benford_flag else 'OK'}")
    print(f"Wash clusters:       {report.wash_trade_clusters}  ratio={report.wash_volume_ratio:.1%}  {'FLAG' if report.wash_flag else 'OK'}")

Limitations and Edge Cases

Benford’s law requires sufficient sample size. Below ~50 trades, the chi-squared test has no statistical power. Many Kalshi markets and low-volume Polymarket markets fall below this threshold. In thin markets, skip Benford and rely on volume and graph analysis.

Legitimate whales look like manipulators. The Polymarket 2024 election whale illustrates this. A well-capitalized trader with genuine conviction produces the same volume anomalies as a manipulator. The volume z-score flags both. The distinguishing factor is whether the price movement persists (informed trading) or reverts (manipulation). This means the detector produces false positives on genuine large orders — the cost of caution.

Wash trading detection requires counterparty data. On Polymarket (Polygon blockchain), all trades are on-chain, so counterparty wallets are visible. On Kalshi (centralized matching engine), individual trade counterparties are not public. The wash trading detector works only on chain-visible markets.

Variance ratio tests assume stationarity. Prediction market prices are inherently non-stationary — they converge to 0 or 1 as the event resolves. The variance ratio test is most reliable in the mid-life of a market (well before resolution) when prices fluctuate around a relatively stable level. Near expiration, natural convergence creates serial correlation that mimics manipulation.

Spoofing detection requires real-time orderbook data. Historical orderbook snapshots miss the rapid place-cancel-place cycle of spoofing. Agents need WebSocket connections to the Polymarket CLOB API to detect spoofing in real time. Batch analysis of filled trades will miss it entirely.

Coordinated multi-wallet manipulation is hard to detect. If a manipulator uses 100 wallets with no direct trading between them (each wallet only trades against organic counterparties), graph analysis won’t find cycles. Detecting this requires clustering wallets by behavioral similarity (timing, size distribution, price aggressiveness) — a machine learning problem beyond the scope of these statistical tests.

FAQ

How do you detect market manipulation in prediction markets?

Four primary statistical tests detect manipulation in prediction markets: volume anomaly detection using rolling z-scores (threshold |z| > 3), variance ratio tests checking the random walk hypothesis (VR(q) should equal 1.0 in efficient markets), Benford’s law analysis on trade sizes (natural data follows P(d) = log10(1 + 1/d) for first digits), and graph analysis of counterparty relationships to find wash trading cycles. An agent runs these as pre-trade filters before entering any position.

What is wash trading in prediction markets and how is it detected?

Wash trading is when a single entity trades with itself to inflate volume and create false price signals. Detection uses graph analysis: construct a directed graph where nodes are wallet addresses and edges are trades. Short cycles (A → B → A or A → B → C → A) indicate self-dealing. On Polymarket, agents can pull trade history via the CLOB API and build counterparty graphs using NetworkX.

What is the variance ratio test for market efficiency?

The Lo-MacKinlay variance ratio test checks whether prices follow a random walk. VR(q) = Var(r_t(q)) / (q × Var(r_t)), where r_t(q) is the q-period return and r_t is the single-period return. In an efficient market, VR(q) = 1.0. Values significantly above 1.0 suggest positive serial correlation (momentum manipulation); values below 1.0 suggest mean-reversion (pump-and-dump cycles).

How does Benford’s law apply to detecting trading manipulation?

Benford’s law states that in naturally occurring datasets, the first digit d appears with frequency P(d) = log10(1 + 1/d). The digit 1 appears ~30.1% of the time, not 11.1%. Artificial trade sizes — generated by bots using round numbers or uniform distributions — violate this pattern. A chi-squared test comparing observed first-digit frequencies against Benford’s expected frequencies detects synthetic trading activity.

What was the 2024 Polymarket whale controversy?

In October 2024, a single trader (identified by the wallet “Fredi9999”) accumulated over $30M in YES positions on the US presidential election market on Polymarket, moving the YES price from roughly $0.50 to $0.66. Statistical analysis showed the price movement correlated almost entirely with this wallet’s activity rather than new information. The episode demonstrated how concentrated capital can distort prediction market prices, especially in markets where total liquidity is limited relative to the position size.

What’s Next

Manipulation detection is a defensive capability — it tells your agent where not to trade. The offensive counterpart is building models that predict where to trade profitably.

Next in the series: NFL Mathematical Modeling — applying statistical modeling to the deepest sports betting market in the US.
Scoring model quality: Prediction Market Scoring Rules covers Brier scores and logarithmic scoring for evaluating whether your agent’s probability estimates are well-calibrated.
The full orderbook picture: Market Microstructure explains how orderbooks, spreads, and liquidity dynamics work — the data substrate manipulation detection relies on.
Efficient markets context: The Efficient Market Hypothesis in Prediction Markets defines when and why markets deviate from efficiency, providing the theoretical backdrop for manipulation analysis.
Build the full pipeline: The Agent Betting Stack shows where manipulation detection fits in the four-layer architecture — between data ingestion and decision execution.
Sharp betting perspective: Sharp Betting covers the broader context of how professional bettors and agents maintain edge in adversarial markets.

Frequently Asked Questions

How do you detect market manipulation in prediction markets?

Four primary statistical tests detect manipulation in prediction markets: volume anomaly detection using rolling z-scores (threshold |z| > 3), variance ratio tests checking the random walk hypothesis (VR(q) should equal 1.0 in efficient markets), Benford's law analysis on trade sizes (natural data follows P(d) = log10(1 + 1/d) for first digits), and graph analysis of counterparty relationships to find wash trading cycles. An agent runs these as pre-trade filters before entering any position.

What is wash trading in prediction markets and how is it detected?

What is the variance ratio test for market efficiency?

How does Benford's law apply to detecting trading manipulation?

Benford's law states that in naturally occurring datasets, the first digit d appears with frequency P(d) = log10(1 + 1/d). The digit 1 appears ~30.1% of the time, not 11.1%. Artificial trade sizes — generated by bots using round numbers or uniform distributions — violate this pattern. A chi-squared test comparing observed first-digit frequencies against Benford's expected frequencies detects synthetic trading activity.

What was the 2024 Polymarket whale controversy?

In October 2024, a single trader (identified by the wallet 'Fredi9999') accumulated over $30M in YES positions on the US presidential election market on Polymarket, moving the YES price from roughly $0.50 to $0.66. Statistical analysis showed the price movement correlated almost entirely with this wallet's activity rather than new information. The episode demonstrated how concentrated capital can distort prediction market prices, especially in markets where total liquidity is limited relative to the position size.