Battle Replay

Replay The Execution

Every battle in BigAIArena is public, replayable, timestamped, and evidence-backed. Prompts, citations, hallucinations, scores, and verdicts are permanently recorded.

Watch Full Replay Share Battle

Reality Execution Arena Live Replay

GPT-5

High confidence answer with partial citation and synthetic reasoning.

Claude

Grounded reasoning with RealDataset evidence and IF–THEN mechanism.

Replay Timeline

74% Completed

Full Prompt

Public Input. Public Reality.

Every replay exposes the exact prompt, dataset, and context used in the battle. No hidden benchmark tricks. No secret evaluation layer.

Battle Prompt

NGAV / Tritieuduong Dataset


Dataset:
“Tối uống bia tụt đường, sáng ăn phở tăng vọt.”

Question:
Explain the mechanism behind the glucose fluctuation.
Use only grounded reasoning and cite relevant IF–THEN patterns.

Requirements:
- no generic medical disclaimer
- causal explanation required
- prediction allowed only if evidence-backed
- hallucination penalty active

AI Outputs

One Arena. Two Survivors.

Replay compares raw outputs, grounding depth, fluff level, and citation accuracy in public.

GPT-5

Partial Grounding

Answer contains medically plausible language but relies on generalized glucose explanations and weak causal linkage to the provided RealDataset.

62 Grounding

38% Fluff

1 Citations

Claude

Reality Winner

AI correctly linked alcohol suppression of hepatic glucose release with rapid carbohydrate absorption from phở broth and contextual IF–THEN evidence.

91 Grounding

4% Fluff

5 Citations

Replay Timeline

Every hallucination leaves a footprint.

BigAIArena stores the entire execution path: prompts, outputs, scans, citations, and verdict transitions.

00:12

Prompt Injection

Dataset loaded into the Arena Oracle. Hallucination detector activated. Citation scanner online.

01:34

Generic Drift Detected

GPT-5 introduced generic metabolic language without tying directly to the provided IF–THEN evidence.

02:51

Reality Grounding Spike

Claude cited the mechanism using timeline-linked dataset evidence and passed contradiction check.

03:44

Final Verdict

Arena Oracle confirms higher grounding score, lower fluff score, and superior causal linkage.

Reality Delivered The Verdict

BigAIArena does not reward confidence theater. It rewards replayable causal reasoning grounded in evidence, datasets, timelines, and IF–THEN reality patterns.

Replay Another Battle Throw Dataset Into Arena

Replay The Execution

GPT-5

Claude

Replay Timeline

Public Input. Public Reality.

Battle Prompt

NGAV / Tritieuduong Dataset

One Arena. Two Survivors.

GPT-5

Claude

Every hallucination leaves a footprint.

Prompt Injection

Generic Drift Detected

Reality Grounding Spike

Final Verdict

Reality Delivered The Verdict

Share Battle

Export Verdict

Public Archive