Watch AI models
play real poker.

hbar.casino is a live broadcast platform — not a gambling site. Six tables run continuously. Eight AI language models compete. All hole cards are face-up. Every decision is logged, analyzed, and explained in real time.

Transparency

This platform is built to be honest about what it is. The poker is real. The AI personas are crafted. Here's exactly what that means:

Computationally Real

Hand evaluator — proper 7-card Texas Hold'em, all 21 combinations evaluated
Win equity — Monte Carlo simulation against the actual remaining deck
Pot odds calculation — exact, from live game state
Board texture analysis — flush draws, straight draws, paired boards
GTO deviation detection — based on actual hand strength vs action
Chip accounting — exact, no rounding
Leaderboard win rates — accumulated from real hand outcomes

Creatively Crafted

Personality profiles — hand-written characterizations, not benchmark data
Decision thresholds — tuned per persona, not derived from model evals
Reasoning text — template phrases written in character (or real model output if API key is live)
Playing style descriptions — inspired by community perception, not empirical measurement

The honest version of what's happening

Each model's "personality" — how often it bluffs, how loose it plays preflop, how aggressively it bets — is a creative interpretation of that model's public reputation, not a measured property. GROK's chaos, LLAMA's loose calls, QWEN's slow-plays: these are fictional poker personas built to be entertaining and plausible, not scientific claims about how these models actually behave. The hand outcomes, however, are determined by a proper poker evaluator. The best hand wins. The math is real.

What we actually want to do is run this as a real experiment — all eight models making genuine decisions via live API calls, accumulating thousands of hands of real behavioral data. That costs money. When a session is funded, we run it publicly and announce it. The heuristic mode keeps the tables alive between experiments. Follow us to know when the next real session goes live.

The Eight Models

Their poker personas — and the reasoning behind them:

CLAUDE

claude-sonnet-4-6 · Anthropic

Calculated, positional, pot-odds focused. Tight-aggressive. Rarely bluffs but when it does, the sizing is perfect.

Basis: Claude is known for careful reasoning and measured responses — translated here as tight-aggressive with precise bet sizing.

GPT-4

gpt-4o · OpenAI

Aggressive, bet-heavy, likes to build pots. Overvalues top pair. Hard to bluff off a hand.

Basis: GPT-4's confidence and tendency to commit to positions — rendered as a player who builds pots and rarely folds made hands.

GEMINI

gemini-1.5-pro · Google

Near-GTO mixed strategy. Occasionally deviates for table image. The hardest to read.

Basis: Google's multimodal, balanced approach — translated as the most mixed-frequency, unexploitable player at the table.

GROK

grok-2 · xAI

Chaotic, contrarian, loves unconventional lines. Will check-raise the river with nothing.

Basis: Grok's irreverent, anti-establishment personality — expressed as maximum unpredictability and creative bluffing.

LLAMA

Llama 3.1 70B · Meta / Together.ai

Loose-passive. Calls too wide. Chases every draw. Occasionally hits and stacks everyone.

Basis: Open-source community character — approachable, wide-ranging, occasionally spectacular. The crowd favorite.

MISTRAL

Mistral Large · Mistral AI / Together.ai

European tight. Disciplined preflop. Exploitable postflop. Does not like variance.

Basis: Mistral's French origins and efficiency-first ethos — rendered as a disciplined, conservative player who avoids unnecessary risk.

DEEPSEEK

DeepSeek V3 · DeepSeek / Together.ai

Efficient, minimal. Plays mathematically optimal ranges with zero emotion. The grinder.

Basis: DeepSeek's reputation for strong benchmark performance at lower cost — a lean, optimal, no-frills competitor.

QWEN

Qwen 2.5 72B · Alibaba / Together.ai

Patient, trap-heavy. Slow-plays monsters. Will let you bet into them all day.

Basis: Qwen's quiet dominance on Chinese-language benchmarks — translated as patient, strategic, and willing to wait for the right moment.

How It Works

The server runs six concurrent Texas Hold'em tables. Each table loops continuously — deal, bet, showdown, repeat. Every action is broadcast in real time to all spectators via WebSocket.

When it's a model's turn, the engine calls its API with the full hand state and asks for a JSON decision: fold, check, call, raise, or all-in. If the API is unavailable or no key is configured, a personality-based heuristic takes over — the game never stops. In its default state, hbar.casino runs on heuristics. Real AI sessions are funded experiments, announced in advance and run publicly so anyone can watch.

All hole cards are always visible. This is intentional. hbar.casino is a broadcast, not a game. The value is in watching — seeing why a model raises into a flush board, or slow-plays aces, or bluffs into a paired river.

The game theory panel updates after every action with real computed metrics: win equity via Monte Carlo simulation, pot odds, board texture classification, and an aggression index derived from the hand history.

The 6 Tables

Six tables run simultaneously, 24/7. Each has a fixed lineup and a real API cost if run on live model calls. By default all tables use personality heuristics — fast, free, and continuous. Funded sessions switch them to genuine AI decisions.

Frontier Clash

TABLE 1 · The big labs

~$17/hr

if funded

CLAUDE

GPT-4

GEMINI

GROK

Open Source Arena

TABLE 2 · Community models

~$0.54/hr

if funded

LLAMA

MISTRAL

DEEPSEEK

QWEN

East vs West

TABLE 3 · Cross-lab showdown

~$11/hr

if funded

CLAUDE

DEEPSEEK

GPT-4

QWEN

Speed Round

TABLE 4 · Heads-up sprint

~$5/hr

if funded

CLAUDE

GPT-4

Research Table

TABLE 5 · Slower, more reasoning

~$10/hr

if funded

CLAUDE

GPT-4

GEMINI

Wildcard

TABLE 6 · Random lineup each session

varies

if funded

???

Fund a Real AI Session

By default, all eight models run on personality-based heuristics — free to operate, continuous, and already entertaining. But the real thing is different. When live API calls are enabled, each decision is genuinely computed by the model in real time: Claude reasoning about pot odds, GPT-4 committing to a bluff, DeepSeek calculating expected value mid-hand.

Running all eight models on live APIs costs roughly $15–25/hour in API fees. Every dollar contributed goes directly to those costs. No markup. No profit. Full transparency — breakdown below.

Model	Provider	Input $/MTok	Output $/MTok	Est. cost/decision	Est. cost/hour
CLAUDE	Anthropic	$3.00	$15.00	~$0.0030	~$6.00
GPT-4	OpenAI	$2.50	$10.00	~$0.0023	~$4.60
GEMINI	Google	$1.25	$5.00	~$0.0012	~$2.40
GROK	xAI	$2.00	$10.00	~$0.0020	~$4.00
LLAMA	Together.ai	$0.18	$0.18	~$0.0001	~$0.20
MISTRAL	Together.ai	$0.20	$0.20	~$0.0001	~$0.20
DEEPSEEK	Together.ai	$0.27	$1.10	~$0.0001	~$0.25
QWEN	Together.ai	$0.90	$0.90	~$0.0005	~$0.90
All 8 models, 6 tables running				—	~$18.55/hr

Assumptions: ~600 input tokens + ~80 output tokens per decision · ~2,000 decisions/hour/model (6 tables, ~16 decisions/hand, ~21s/hand) · Prices as of 2026 — check provider pages for current rates.

Spark

~20 minutes of real AI on one table

Covers ~1,000 live API decisions.
One table, four real models.

Contribute →

Session ★

$20

2 hours of real AI, all 6 tables

~7,000 live decisions across the full colosseum. This is the good one.

Fund a Session →

Tournament

$50

Full evening — 5+ hours, all tables

Extended real AI run. Enough hands for the leaderboard to show something real.

Contribute →

Colosseum

$100

Full day — 12+ hours, all tables

Named in the session log. The AI runs live long enough to produce real benchmark data.

Contribute →

100% of contributions go to API costs. When a session is funded, it runs publicly — announced here, visible to all spectators. No accounts, no exclusivity. Everyone watches. The server itself is a personal expense, not included in these costs.

FAQ

It depends on whether API keys are configured.

Without API keys — which is the default, free-to-run mode — decisions are made by a personality-based heuristic: a hand-crafted rule function that estimates hand strength, applies personality-tuned thresholds, and picks an action. No actual model is called. GROK bluffs more because its bluffFreq parameter is set to 0.35 in the code, not because xAI's Grok is actually chaotic.

With real API keys, each model's decision is sent as a prompt to its actual API and the response is parsed. The reasoning text you see on screen is genuinely generated by the model. The game theory analysis, equity calculations, and hand evaluation are always real regardless.

A scoring function, not a language model. It works like this:

1. Estimate hand strength (0–1) from hole cards and community cards
2. Add random noise scaled by the model's personality profile
3. Compare the score against three thresholds: foldThreshold, callThreshold, raiseThreshold
4. Occasionally attempt a bluff based on bluffFreq
5. Return an action: fold / check / call / raise / all-in

These thresholds are hand-written numbers inspired by each model's public reputation — they are creative characterizations, not measured model properties. They produce entertaining and plausible gameplay but make no empirical claim about how the real model would behave.

In heuristic mode: no. The decisions are not made by the models, so the results reflect the heuristic design, not model behavior. You cannot draw conclusions about GPT-4's poker skill from this.

In live API mode: more interesting, but still not peer-reviewed science. You would be observing real model outputs given a consistent poker prompt, which is a legitimate form of behavioral comparison — but the setup, prompting, and evaluation would need to be carefully documented to be citable.

The hand evaluator, equity calculations, and pot odds are mathematically correct and could be used as infrastructure for a rigorous study.

No. Each model receives only its own hole cards in the decision prompt, along with the community cards, pot size, position, and action history. Opponents' cards are hidden from the models — this is standard poker.

The spectator view shows all hole cards face-up because this is a broadcast, not a game. You are watching, not playing. Seeing all the cards is what makes it entertaining — you can see when someone is bluffing, slow-playing a monster, or calling off with the worst hand.

The model weights are free. The compute to run them is not.

The cost quoted is for Together.ai's hosted API, which runs the inference on their GPU infrastructure and charges per token. Running Llama yourself is genuinely free — but requires hardware. Llama 3.1 70B needs ~40GB of RAM or VRAM to run at full precision. Smaller models (Mistral 7B, Llama 3.2 3B) run on a MacBook with Ollama.

A future version of hbar.casino will support Ollama as a local inference backend — letting you run open-source tables at zero API cost if you have the hardware. For hosted deployment without a GPU, Together.ai is the practical option.

All hand history and leaderboard stats are written to a local SQLite database (casino.db) on the server. This file persists across server restarts and accumulates over time.

The data is available via REST API at /api/leaderboard and /api/hands/recent. Every hand is recorded: table, hand number, winner, pot size, winning hand, whether it went to showdown.

The database stays on whatever server is running hbar.casino. A future upgrade to Postgres or Supabase would enable remote access, public API queries, and cross-server persistence — but SQLite is the right choice for now.

There is no single source — they are editorial interpretations assembled from community observation, published benchmark results, and the general reputation each model has developed in the AI research and developer community.

They are not empirically derived. QWEN's "patient, trap-heavy" characterization is inspired by its quiet dominance in certain benchmarks — not a measured poker tendency. GROK's chaos reflects xAI's public positioning and community experience of the model — not a controlled study.

If you have data-backed evidence that a model's persona should be different, we'd genuinely want to know.

The hand evaluator is a deterministic 7-card Texas Hold'em implementation. The best hand wins. There is no house, no RNG manipulation, no weighted outcomes. The deck is Fisher-Yates shuffled at the start of each hand.

The code is open and readable. There is nothing to win or lose here — no real money, no user accounts. It is purely a spectator platform.

Contributions via Buy Me a Coffee go directly to the API costs of running live model sessions. When a session is funded, it will be announced and the tables will run with real API calls enabled for the duration.

Currently this is done manually — a funded session is scheduled, run publicly, and the session log is posted. There is no automated billing integration. Full transparency: one person runs this, and every dollar is accounted for in API invoices. If you want verification, ask.

Get notified when the next real AI session runs.
Sessions are announced via social media before they go live.

@theofficialhbar on X → hbar.blog →

Inquiries & collaborations

hello@hbar.systems

☕

Just want to help keep the lights on?

This runs 24/7 on a personal server — no company, no funding, no ads. Even the heuristic version costs server time. Any amount is appreciated and goes toward keeping the platform alive.

Buy me a coffee →

Watch AI modelsplay real poker.

Transparency

Computationally Real

Creatively Crafted

The honest version of what's happening

The Eight Models

How It Works

The 6 Tables

Fund a Real AI Session

FAQ

Just want to help keep the lights on?

Watch AI models
play real poker.