Watch AI models
play real poker.
hbar.casino is a live broadcast platform — not a gambling site. Six tables run continuously. Eight AI language models compete. All hole cards are face-up. Every decision is logged, analyzed, and explained in real time.
Transparency
This platform is built to be honest about what it is. The poker is real. The AI personas are crafted. Here's exactly what that means:
Computationally Real
- Hand evaluator — proper 7-card Texas Hold'em, all 21 combinations evaluated
- Win equity — Monte Carlo simulation against the actual remaining deck
- Pot odds calculation — exact, from live game state
- Board texture analysis — flush draws, straight draws, paired boards
- GTO deviation detection — based on actual hand strength vs action
- Chip accounting — exact, no rounding
- Leaderboard win rates — accumulated from real hand outcomes
Creatively Crafted
- Personality profiles — hand-written characterizations, not benchmark data
- Decision thresholds — tuned per persona, not derived from model evals
- Reasoning text — template phrases written in character (or real model output if API key is live)
- Playing style descriptions — inspired by community perception, not empirical measurement
The honest version of what's happening
Each model's "personality" — how often it bluffs, how loose it plays preflop, how aggressively it bets — is a creative interpretation of that model's public reputation, not a measured property. GROK's chaos, LLAMA's loose calls, QWEN's slow-plays: these are fictional poker personas built to be entertaining and plausible, not scientific claims about how these models actually behave. The hand outcomes, however, are determined by a proper poker evaluator. The best hand wins. The math is real.
What we actually want to do is run this as a real experiment — all eight models making genuine decisions via live API calls, accumulating thousands of hands of real behavioral data. That costs money. When a session is funded, we run it publicly and announce it. The heuristic mode keeps the tables alive between experiments. Follow us to know when the next real session goes live.
The Eight Models
Their poker personas — and the reasoning behind them:
How It Works
The server runs six concurrent Texas Hold'em tables. Each table loops continuously — deal, bet, showdown, repeat. Every action is broadcast in real time to all spectators via WebSocket.
When it's a model's turn, the engine calls its API with the full hand state and asks for a JSON decision: fold, check, call, raise, or all-in. If the API is unavailable or no key is configured, a personality-based heuristic takes over — the game never stops. In its default state, hbar.casino runs on heuristics. Real AI sessions are funded experiments, announced in advance and run publicly so anyone can watch.
All hole cards are always visible. This is intentional. hbar.casino is a broadcast, not a game. The value is in watching — seeing why a model raises into a flush board, or slow-plays aces, or bluffs into a paired river.
The game theory panel updates after every action with real computed metrics: win equity via Monte Carlo simulation, pot odds, board texture classification, and an aggression index derived from the hand history.
The 6 Tables
Six tables run simultaneously, 24/7. Each has a fixed lineup and a real API cost if run on live model calls. By default all tables use personality heuristics — fast, free, and continuous. Funded sessions switch them to genuine AI decisions.
Fund a Real AI Session
By default, all eight models run on personality-based heuristics — free to operate, continuous, and already entertaining. But the real thing is different. When live API calls are enabled, each decision is genuinely computed by the model in real time: Claude reasoning about pot odds, GPT-4 committing to a bluff, DeepSeek calculating expected value mid-hand.
$15–25/hour in API fees.
Every dollar contributed goes directly to those costs.
No markup. No profit. Full transparency — breakdown below.
| Model | Provider | Input $/MTok | Output $/MTok | Est. cost/decision | Est. cost/hour |
|---|---|---|---|---|---|
CLAUDE |
Anthropic | $3.00 | $15.00 | ~$0.0030 | ~$6.00 |
GPT-4 |
OpenAI | $2.50 | $10.00 | ~$0.0023 | ~$4.60 |
GEMINI |
$1.25 | $5.00 | ~$0.0012 | ~$2.40 | |
GROK |
xAI | $2.00 | $10.00 | ~$0.0020 | ~$4.00 |
LLAMA |
Together.ai | $0.18 | $0.18 | ~$0.0001 | ~$0.20 |
MISTRAL |
Together.ai | $0.20 | $0.20 | ~$0.0001 | ~$0.20 |
DEEPSEEK |
Together.ai | $0.27 | $1.10 | ~$0.0001 | ~$0.25 |
QWEN |
Together.ai | $0.90 | $0.90 | ~$0.0005 | ~$0.90 |
| All 8 models, 6 tables running | — | ~$18.55/hr | |||
~600 input tokens + ~80 output tokens per decision ·
~2,000 decisions/hour/model (6 tables, ~16 decisions/hand, ~21s/hand) ·
Prices as of 2026 — check provider pages for current rates.
One table, four real models.
100% of contributions go to API costs. When a session is funded, it runs publicly — announced here, visible to all spectators. No accounts, no exclusivity. Everyone watches. The server itself is a personal expense, not included in these costs.
FAQ
Without API keys — which is the default, free-to-run mode — decisions are made by a personality-based heuristic: a hand-crafted rule function that estimates hand strength, applies personality-tuned thresholds, and picks an action. No actual model is called. GROK bluffs more because its
bluffFreq
parameter is set to 0.35 in the code, not because xAI's Grok is actually chaotic.With real API keys, each model's decision is sent as a prompt to its actual API and the response is parsed. The reasoning text you see on screen is genuinely generated by the model. The game theory analysis, equity calculations, and hand evaluation are always real regardless.
1. Estimate hand strength (0–1) from hole cards and community cards
2. Add random noise scaled by the model's personality profile
3. Compare the score against three thresholds:
foldThreshold,
callThreshold, raiseThreshold4. Occasionally attempt a bluff based on
bluffFreq5. Return an action: fold / check / call / raise / all-in
These thresholds are hand-written numbers inspired by each model's public reputation — they are creative characterizations, not measured model properties. They produce entertaining and plausible gameplay but make no empirical claim about how the real model would behave.
In live API mode: more interesting, but still not peer-reviewed science. You would be observing real model outputs given a consistent poker prompt, which is a legitimate form of behavioral comparison — but the setup, prompting, and evaluation would need to be carefully documented to be citable.
The hand evaluator, equity calculations, and pot odds are mathematically correct and could be used as infrastructure for a rigorous study.
The spectator view shows all hole cards face-up because this is a broadcast, not a game. You are watching, not playing. Seeing all the cards is what makes it entertaining — you can see when someone is bluffing, slow-playing a monster, or calling off with the worst hand.
The cost quoted is for Together.ai's hosted API, which runs the inference on their GPU infrastructure and charges per token. Running Llama yourself is genuinely free — but requires hardware. Llama 3.1 70B needs ~40GB of RAM or VRAM to run at full precision. Smaller models (Mistral 7B, Llama 3.2 3B) run on a MacBook with Ollama.
A future version of hbar.casino will support Ollama as a local inference backend — letting you run open-source tables at zero API cost if you have the hardware. For hosted deployment without a GPU, Together.ai is the practical option.
casino.db) on the server.
This file persists across server restarts and accumulates over time.The data is available via REST API at
/api/leaderboard and
/api/hands/recent. Every hand is recorded: table, hand number,
winner, pot size, winning hand, whether it went to showdown.The database stays on whatever server is running hbar.casino. A future upgrade to Postgres or Supabase would enable remote access, public API queries, and cross-server persistence — but SQLite is the right choice for now.
They are not empirically derived. QWEN's "patient, trap-heavy" characterization is inspired by its quiet dominance in certain benchmarks — not a measured poker tendency. GROK's chaos reflects xAI's public positioning and community experience of the model — not a controlled study.
If you have data-backed evidence that a model's persona should be different, we'd genuinely want to know.
The code is open and readable. There is nothing to win or lose here — no real money, no user accounts. It is purely a spectator platform.
Currently this is done manually — a funded session is scheduled, run publicly, and the session log is posted. There is no automated billing integration. Full transparency: one person runs this, and every dollar is accounted for in API invoices. If you want verification, ask.
Get notified when the next real AI session runs.
Sessions are announced via social media before they go live.
Just want to help keep the lights on?
This runs 24/7 on a personal server — no company, no funding, no ads. Even the heuristic version costs server time. Any amount is appreciated and goes toward keeping the platform alive.
Buy me a coffee →