AyahVerse at NakbaArchiveClassifier Shared Task: Architectural Trade-offs and Decision Calibration for Humanitarian Image Classification

Proceedings of the 2nd International Workshop on Nakba Narratives as Language Resources @ LREC 2026

Abstract

This paper presents our submission to the Nakba-NLP 2026 Shared Task on binary image classification, where the goal is to categorize images of Gaza infrastructure as destroyed or intact. To address the challenges of class imbalance and resource-constrained deployment, we evaluated three convolutional architectures: ResNet50, MobileNetV2, and EfficientNet-B0, combined with a post-hoc threshold optimization step. Our results show that lightweight architectures are competitive with heavier models for this task, with EfficientNet-B0 achieving the highest Test F1-score of 0.85 despite having significantly fewer parameters than ResNet50. We further investigated the effect of input resolution, finding that increasing resolution improved ResNet50’s performance, though it remained below lightweight alternatives. Finally, we demonstrate that shifting the binary decision threshold from the default 0.50 to an optimized 0.45 improved ResNet50’s Test F1 from 0.79 to 0.81 by recovering recall for the minority destroyed class. Notably, this adjustment was only needed for ResNet50, while EfficientNet-B0 and MobileNetV2 performed best at the default 0.50, suggesting that larger models are more prone to majority-class bias. Overall, these results provide a systematic analysis of architectural efficiency and threshold behavior under class imbalance, offering practical insights for damage classification in resource-constrained crisis settings.