ARHAHA 2026: The Shared Task on Arabic Humor Automatic Generation
The 7th Workshop on Open-Source Arabic Corpora and Processing Tools (OSACT7) with 5 Shared Tasks
Abstract
Humor generation remains one of the most challenging tasks in natural language processing, particularly in Arabic, where cultural context, dialectal variation, and linguistic nuances are central to comedic effect. In this paper, we present the ARHAHA 2026 shared task on constrained Arabic humor generation. The task requires systems to generate jokes that incorporate a given pair of words while adhering to safety and cultural constraints. We describe the task design, dataset construction, and evaluation framework, which combines automatic validation with human evaluation. Nine teams registered for the shared task; among them, three submitted final system outputs and two provided system description papers. Each participating system generated 1,200 Arabic jokes. For each system, a subset of 300 jokes was selected for evaluation by three independent annotators. The evaluation considered humor quality, originality, lexical constraint compliance, and safety. The results show that participating systems can produce safe and original content. However, generating genuinely humorous outputs remains difficult. The top-performing system was judged humorous in only 5.01% of outputs, highlighting the inherent difficulty of computational humor generation. All three systems maintained very low rates of policy violations and stereotyping, demonstrating the effectiveness of constrained generation for safe content production. However, the very low humor rates indicate a substantial gap between generating fluent, constraint-compliant text and producing genuinely funny content. The top-performing system achieves stronger performance across originality, lexical compliance, and safety, resulting in a final score of 49.25, compared to 44.62 for the second-ranked system and 35.99 for the third-ranked system. These results reveal that humor generation, rather than safety or constraint adherence, is the dominant bottleneck in constrained Arabic humor generation.