PaintBench

Easy tasks in a raster editor, still out of reach for frontier image editing models.

The highest-performing model we evaluated (Nano Banana 2) scores only 17.1%.

Model Performance

Click a row to expand per-category scores.

# Model PaintBench mIoU

We also built TinyGrafixBench

Can PaintBench performance predict data visualization editing skills? To find out, we created TinyGrafixBench: a procedurally generated, deterministically evaluated benchmark of 20 such tasks spanning 5 chart types.

Finding: strong correlation between PaintBench and TinyGrafixBench scores suggests generalization of PaintBench operations to applied precise visual editing tasks.
PaintBench
1Nano Banana 217.1%
2GPT Image 216.3%
3Nano Banana 111.1%
4–11All others≤6.7%
TinyGrafixBench
1Nano Banana 215.9%
2GPT Image 215.6%
3Nano Banana 15.3%
4–11All others≤3.4%