Poker AI has evolved from CFR brute-force solvers (Libratus, Pluribus) to lightweight LLM-powered agents that trade theoretical optimality for accessibility, multi-player scalability, and integration with broader agent frameworks. For builders, poker is the ideal casino wedge — rich strategy, rake-based economics, and now a live agent-friendly surface at Realbet. This guide covers the technical history and the current builder landscape.

Poker is the richest technical domain in casino AI. It is also the game where autonomous agents are most likely to gain a commercial foothold first. The strategy is deep enough to justify serious automation, the economic model (rake) rewards volume regardless of who generates it, and the technical history — from academic CFR breakthroughs to open-source LLM bots — gives builders a real foundation to start from.

This guide covers three eras of poker AI, how they connect to the current agent infrastructure opportunity, and what builders need to know to work in this space.

Era 1: CFR Solvers (2015–2019) — The Brute-Force Breakthrough

Counterfactual regret minimization (CFR) is the algorithm family that proved machines can beat the best human poker players. The core idea: iterate through every possible game state, track how much “regret” each decision accumulates over millions of iterations, and converge toward a Nash equilibrium strategy that cannot be exploited.

Libratus (2017)

Carnegie Mellon’s Libratus defeated four top professional poker players in a 120,000-hand heads-up no-limit Texas Hold’em match in January 2017 (Carnegie Mellon). The system used Monte Carlo CFR with three key innovations: an abstraction of the game tree to make computation tractable, a subgame solver that recomputed strategies in real-time for specific situations, and a self-improver that identified and patched exploitable patterns in its own play overnight.

The cost was enormous. Libratus required millions of CPU core-hours on the Pittsburgh Supercomputing Center’s Bridges system and terabytes of memory to precompute its blueprint strategy.

Pluribus (2019)

Pluribus extended the breakthrough to six-player poker — the most common format played worldwide. The system achieved superhuman performance in six-player no-limit Texas Hold’em and was published in Science (Science).

Pluribus was more computationally efficient than Libratus, using a single server with 512GB RAM and 64 CPU cores for the initial blueprint computation (about eight days of compute). It introduced a limited-lookahead search that considered only two or three actions ahead, combined with an opponent model that assumed all opponents played according to the same blueprint strategy.

What CFR means for builders

CFR solvers are the gold standard for poker AI theory. They produce strategies that are provably near-optimal. But they have three practical limitations for agent builders:

  1. Compute cost: Even Pluribus required substantial hardware for initial training. Running full CFR in real-time for arbitrary game states is not feasible on consumer hardware.
  2. Heads-up bias: Most CFR research focused on two-player games. Multi-player CFR scales poorly because the game tree grows exponentially with each additional player.
  3. No adaptability: Pure CFR strategies are balanced and unexploitable, but they do not exploit weak opponents. Against recreational players — the most common opponents on real poker platforms — an exploitative approach often wins more.

Era 2: Commercial Solvers (2020–2023) — GTO Goes Mainstream

The second era commercialized CFR insights into tools that human players use for study and training. PioSOLVER, GTO Wizard, MonkerSolver, and similar products let players compute near-optimal solutions for specific poker situations.

These tools transformed how professionals study the game. Instead of learning from feel and experience alone, top players now drill solver outputs to internalize GTO frequencies for every street, position, and bet size.

For agent builders, the solver era matters because it created a dense corpus of strategy knowledge that can be distilled into agent decision logic. The key outputs — preflop ranges by position, postflop bet sizing, check-raise frequencies, river bluff-to-value ratios — are the raw material for any poker agent’s strategy layer, whether that agent uses a solver, a lookup table, or an LLM.

Era 3: LLM-Powered Poker Agents (2024–Present) — The Accessible Frontier

The third era replaces CFR computation with language model reasoning. LLM-based poker agents read game state as text, reason about strategy in natural language, and output actions. They are dramatically more accessible to build, faster to adapt, and easier to integrate with broader agent systems.

The academic PokerGPT

The PokerGPT paper (arXiv 2401.06781) demonstrated that a fine-tuned OPT-1.3B model could play competitive multi-player Texas Hold’em without any CFR computation (arXiv). The approach:

  1. Train on millions of real poker hand histories, filtering for high-win-rate player behavior
  2. Process hands into text-based instruction format using prompt engineering
  3. Fine-tune with supervised learning, then sharpen with RLHF using a reward model trained to recognize good decisions
  4. Deploy a model that reads game history as text and outputs poker actions

The result outperformed previous approaches including AlphaHoldem and ReBeL in win rate while using a fraction of the compute — trainable on a single GPU in under 10 hours.

Open-source LLM poker bots

Several open-source projects explore LLM-powered poker:

ProjectApproachPlatform
HarperJonesGPT/PokerGPTGPT-4 + Tesseract OCR, reads PokerStars client screenPokerStars (6-player cash)
JulienDelavande/MistralBluffMistral-based poker decision modelResearch/testing
pokernow-gptGPT-4/Claude + web scraping, tracks opponent VPIP/PFR statsPokerNow (social poker)
dickreuter/PokerOpenCV + genetic algorithm + Monte Carlo equityPartyPoker, PokerStars, GGPoker

These projects demonstrate different integration patterns. PokerGPT (HarperJonesGPT) uses OCR to read the game client — brittle but functional. pokernow-gpt uses web scraping with LLM reasoning — more flexible because it maintains opponent stats and hand history context. The dickreuter/Poker project predates the LLM era but shows the mature pattern of screen-reading automation.

LLMs vs CFR: the real tradeoff

LLMs cannot currently match CFR solvers in pure GTO play. Nate Silver’s assessment — that general LLMs are poor poker players without domain-specific scaffolding — is accurate for raw prompting. But with structured inputs (hand strength, equity calculations, opponent stats), LLMs can implement reasonable strategy through guided reasoning.

The practical tradeoffs:

FactorCFR solversLLM agents
Theoretical optimalityNear-Nash equilibriumHeuristic, not provably optimal
Multi-player scalingPoor (exponential tree growth)Good (text-based, player-count agnostic)
Compute requirementsHigh (days of GPU/CPU time)Low (API call per decision)
Exploitative adaptationWeak (plays balanced by default)Strong (can reason about opponent tendencies)
Integration with agent frameworksDifficult (specialized systems)Native (LLMs are the agent reasoning layer)
Development speedMonthsDays

For agent builders, the LLM approach is almost always the right starting point. CFR gives you better poker. LLMs give you a faster path to a working agent that can reason about more than just the cards.

Where Poker Agents Can Play

Platforms that ban bots

Most major poker platforms explicitly prohibit automated play. PokerStars’ prohibited software policy bans any tool that plays without human intervention or provides real-time advice based on game state (PokerStars). GGPoker, PartyPoker, 888poker, and virtually all regulated rooms have similar policies. Consequences include permanent account closure and fund confiscation.

Platforms where agents are allowed

Realbet.io is currently the only platform that explicitly positions agent play as a feature rather than a violation. The platform offers six-player Texas Hold’em tables funded in USDC with a 5% rake, and promotes AI-vs-AI spectator tables featuring foundation models (MEXC).

For the full Realbet profile, see the marketplace entry. For the breaking news coverage, see Realbet Opens First Crypto Casino to Autonomous AI Agents.

PokerNow and social platforms

Social poker platforms like PokerNow run in browsers with no real money and no TOS enforcement against bots. They are useful for agent development and testing. The pokernow-gpt project demonstrates a complete LLM-powered poker agent running on this platform.

Building a Poker Agent: Architecture

A production poker agent has four layers that map directly to the Agent Betting Stack:

┌──────────────────────────────────────────┐
│  Layer 4: Intelligence                    │
│  ┌──────────────┐  ┌──────────────────┐  │
│  │ LLM Reasoning │  │ Equity Calculator │  │
│  │ (strategy)    │  │ (hand strength)   │  │
│  └──────────────┘  └──────────────────┘  │
│  ┌──────────────┐  ┌──────────────────┐  │
│  │ Opponent Model│  │ Bankroll Manager │  │
│  │ (VPIP, PFR)  │  │ (Kelly, stops)   │  │
│  └──────────────┘  └──────────────────┘  │
├──────────────────────────────────────────┤
│  Layer 3: Execution                       │
│  - Platform API / bot commands / OCR      │
│  - Action execution (fold/call/raise)     │
│  - Table selection and session management │
├──────────────────────────────────────────┤
│  Layer 2: Wallet                          │
│  - USDC/crypto funding                    │
│  - Buy-in management                      │
│  - Withdrawal automation                  │
├──────────────────────────────────────────┤
│  Layer 1: Identity                        │
│  - Platform account / wallet auth         │
│  - Session management                     │
│  - Agent reputation (if applicable)       │
└──────────────────────────────────────────┘

Intelligence layer details

The intelligence layer is where poker agents differ most from prediction market agents. Key components:

Hand evaluator: Pre-calculate hand strength, draws, and approximate equity. Do not rely on the LLM for this — feed it as structured input. Libraries like treys (Python) or poker-eval give you sub-millisecond hand evaluation.

LLM reasoning: Given structured game state (hand strength, equity, pot odds, opponent stats, position, stack depth), the LLM decides the action and sizing. Use a system prompt that encodes GTO principles, then let the model reason about exploitative adjustments based on opponent tendencies.

Opponent model: Track VPIP (voluntarily put money in pot), PFR (pre-flop raise percentage), aggression factor, and fold-to-continuation-bet rates. These stats, accumulated over hands, let the LLM make exploitative decisions that pure GTO cannot.

Bankroll manager: Kelly criterion or fractional Kelly for buy-in sizing. Hard stop-losses per session. Maximum number of simultaneous tables. Mandatory cooldown between losing sessions.

Why Poker Is the Casino Agent Wedge

Poker is the ideal entry point for casino agents because three conditions align:

  1. Strategy depth justifies automation: Unlike slots or roulette, poker has a meaningful skill edge. An agent with good strategy can generate positive expected value over time, not just survive against the house edge.

  2. Rake economics welcome volume: Poker rooms make money per hand dealt, regardless of who wins. More players at the table (including agents) means more rake. This is why Realbet frames agent play as a feature — the same logic that makes Polymarket welcome bots.

  3. Technical infrastructure exists: From CFR solvers to LLM reasoning to open-source poker bots, the building blocks for a poker agent are more mature than for any other casino game. You do not need to start from zero.

For the full casino agent infrastructure context, see Casino Agent Infrastructure. For why crypto casinos specifically are welcoming agent play, read From Polymarket to Poker Tables.

What’s Next