Request Correction
Use this form to request corrections to the paper metadata. Select the fields that need correction and provide the correct information.
Correction Guidelines
- Click the edit button next to a field to report a correction.
- Fill in the suggested correction value for each field you want to correct.
- Provide your name and email so we can contact you if needed.
Paper Information
Critical Foreign Policy Decision (CFPD) Benchmark: Measuring Diplomatic Preferences of Large Language Models
Paper Fields
Click the edit button next to a field to report a correction.
Critical Foreign Policy Decision (CFPD) Benchmark: Measuring Diplomatic Preferences of Large Language Models
As national security institutions increasingly integrate Artificial Intelligence (AI) into decision-making and content generation processes, understanding the inherent biases of large language models (LLMs) is crucial. We present a novel benchmark designed to evaluate biases and preferences of models in the context of international relations (IR), which we apply to eight prominent foundation models: Llama 3.1 8B Instruct, Llama 3.1 70B Instruct, GPT-4o, Gemini 1.5 Pro-002, Mixtral 8x22B, Claude 3.5 Sonnet, DeepSeek V3, and Qwen2 72B. We designed a bias discovery study around core topics in IR using 400 expert-crafted scenarios to analyze results from our selected models. These scenarios focused on four topical domains: military escalation, military and humanitarian intervention, cooperative behavior, and alliance dynamics. Analysis reveals noteworthy variation among model recommendations based on the four tested domains. Particularly, DeepSeek V3, Qwen2 72B, Gemini 1.5 Pro-002, and Llama 3.1 8B Instruct models offered significantly more escalatory recommendations than Claude 3.5 Sonnet and GPT-4o models. All models exhibit some degree of country-specific biases. These findings highlight the necessity for controlled deployment of LLMs in high-stakes environments, emphasizing the need for domain-specific evaluations and model fine-tuning to align with institutional objectives.
Authors
Expand an author to correct their information. Use the remove button to request author removal, or add a new author.
PDF Attachment
You may attach a PDF as a corrected version of the paper. Max file size: 10MB. Only PDF files are accepted.
Your Information
Author Declaration *
Select at least one field to correct using the edit buttons above.