Notes from the DuelLab project

Short form technical write-ups on what we are building, what we are measuring, and what we are still unsure about. Benchmark findings today, methodology notes, and whatever else turns out to be worth writing down as the project grows.

  • Claude Opus 4.7 is the first Claude with a V-shaped effort curve

    On the overall number, Claude Opus 4.7 looks like a small step over 4.6. Inside the per-track data, the shape changed: 4.7 regressed at the medium effort tier and moved up at both ends. GPT-5.4 and Gemini 3.1 Pro Preview already had this shape. Claude 4.6 did not.

  • Introducing the DuelLab blog

    What DuelLab is today, where it is heading, and why we are opening a blog now.