GPT-5.6 Grayscale Testing: Users Discover Hidden "Juice Value" to Detect Model Upgrade

OpenAI released the GPT-5.6 series on June 26, 2025, including flagship Sol, mid-range Terra, and low-cost Luna, initially for invited partners only. Within 48 hours, users found a method to detect grayscale upgrades via Codex by sending specific prompts to reveal a hidden "Juice value" in the model's system prompt: GPT-5.5 in xhigh mode returns 768, GPT-5.6 Sol returns 128. Some users' usage panels already show gpt-5.6 call records. OpenAI officially states ChatGPT is unavailable during preview, but grayscale testing has covered some Plus users.

Model Specifications and Pricing

Sol (Flagship): $5/M input tokens, $30/M output tokens, 1.5M context tokens (43% increase over GPT-5.5).
Terra (Mid-range): Half the price, performance close to GPT-5.5.
Luna (Low-cost): $1/M input tokens, $6/M output tokens.
Introduced explicit cache breakpoints with a minimum 30-minute lifetime; cache writes billed at 1.25x, reads enjoy 90% discount.
New inference-side max reasoning effort and ultra mode (via sub-agent collaboration).

Performance

Terminal-Bench 2.1: Sol Ultra scores 91.9%, surpassing GPT-5.5 (88.0%), Claude Mythos 5 (84.3%), Claude Fable 5 (83.4%), and Gemini 3.1 Pro Preview (70.7%).
ExploitBench: Sol achieves comparable performance to Claude Mythos Preview using about one-third the output tokens.
Cybersecurity: Sol scores 96.7% in internal OpenAI tests, crossing the "High" risk threshold, but is emphasized to be better at finding and fixing vulnerabilities than launching attacks.
GeneBench v1: Token efficiency superior to GPT-5.5 in long-range genomic analysis.

Safety and Access Restrictions

Safety stack includes model-level refusal, real-time classifiers, cross-session review, and risk-level authorization.
Red teaming invested over 700,000 A100-equivalent GPU hours, supplemented by third-party human testing.
Communication with the U.S. government prior to release; currently limited to government-approved partners.
OpenAI plans full rollout "in the coming weeks," with the community speculating a larger release as early as June 30.

Grayscale Detection Method

Juice Value Test: In Codex, select gpt-5.5 with thinking intensity xhigh, send a specific XML prompt; if the answer is 128, it's GPT-5.6 Sol; if 768, it's GPT-5.5.
Context Window Detection: Run /status in Codex CLI; if default context shows 353k, it may have been grayscale upgraded.
Usage Panel: Visit the analytics page to check for gpt-5.6 call records (updated the next day).
Note: Grayscale coverage is uneven, limited to Codex; web ChatGPT is not yet supported.

GPT-5.6 Grayscale Testing: Users Discover Hidden "Juice Value" to Detect Model Upgrade

Model Specifications and Pricing

Performance

Safety and Access Restrictions

Grayscale Detection Method

Documentation

Getting Started

Learn more