Context Is (Almost) Everything: Llama-3 on Structured Output and AMR Parsing
Proceedings of the Fifteenth Language Resources and Evaluation Conference (LREC 2026)
Abstract
This paper evaluates the ability of an open-source LLM (Llama-3.1) to compute sentence-level semantics and encode it in formal language. We here compare two versions of the model on the task of generating a meaning representation graph for a given English sentence in the form of Abstract Meaning Representation. We explore the model’s in-context learning capability, comparing zero-shot prompting to few-shot demonstrations of varying levels of specificity. We find that Llama-3.1 frequently makes errors when reproducing the syntactic structure of both seen and unseen structured output, and that it only achieves near-SotA parsing performance when shown highly specific demonstrations similar in structure to the target sentence graph. We include an in-depth analysis of the model output, considering performance through the lens of fine-grained semantic phenomena, graph properties (e.g. top node accuracy), and graph complexity.