Evaluating the Abilities of LLMs and SpeechLMs in Discovering Implicit Contents of Italian Political Speeches

Proceedings of the Second Workshop on Building Educational Applications Using NLP

Abstract

This research investigates the pragmatic competence of Large Language Models (LLMs) in interpreting implicit meanings within Italian political discourse. Using the IMPAQTS-PIDMM dataset, which is a multimodal benchmark derived from the 2.5-million-token IMPAQTS corpus, the experiment evaluates how effectively models identify tendentious content such as presuppositions and implicatures. The study compares the performance of text-only LLMs against speech-based models (SpeechLMs) that process both audio and transcriptions to determine if acoustic cues enhance understanding. The results reveal that text-only models significantly outperform multimodal variants, with Qwen2.5-72B achieving the highest global accuracy of 0.863. Surprisingly, the inclusion of audio did not improve performance, as SpeechLMs like GPT-4o-mini-audio-preview and Qwen2-Audio-7B-Instruct obtained lower accuracy scores and a higher frequency of missed answers compared to their text-only equivalents. Across all tested architectures, models generally demonstrated a superior ability to process presuppositions over implicatures.