From Scenes to Semantics: PersianCLEVR for Bilingual 3D Visual Reasoning
We introduce PersianClevr, the first bilingual benchmark for 3D visual reasoning in English and Persian, built from CLEVR, Super-CLEVR, and ClevrTex with new QA synthesis. It evaluates VLMs on attributes, counting, comparison, spatial relations, and logic, revealing strong surface understanding but clear weaknesses in true 3D and compositional reasoning.
PrismSSL