Request Correction
Use this form to request corrections to the paper metadata. Select the fields that need correction and provide the correct information.
Correction Guidelines
- Click the edit button next to a field to report a correction.
- Fill in the suggested correction value for each field you want to correct.
- Provide your name and email so we can contact you if needed.
Paper Information
Learning Long-Document Embeddings via Chunk–Context Entailment
Paper Fields
Click the edit button next to a field to report a correction.
Learning Long-Document Embeddings via Chunk–Context Entailment
Learning faithful embeddings for long documents remains challenging, especially in domains like law and medicine where inputs are long, structured, and semantically heterogeneous. We introduce the Chunk Prediction Encoder (CPE), a self-supervised framework that treats chunk–context compatibility as an unsupervised NLI problem. Given a document, CPE masks a chunk and learns (i) a contrastive objective that aligns the masked document with its held-out chunk against in-batch negatives, and (ii) a binary entailment head that predicts whether a candidate chunk belongs to the document. This joint objective encourages both geometric smoothness and directional semantic consistency, yielding robust document-level embeddings. We evaluate CPE with hierarchical and sparse-attention backbones on five benchmarks spanning legal and biomedical domains under frozen-embedding and end-to-end fine-tuning protocols. CPE consistently outperforms baselines, and is more compute-efficient than prompt-only LLM baselines under matched token budgets. Ablations demonstrate the effect of chunk length, the contrastive-vs-entailment balance, and skimming strategies.
Authors
Expand an author to correct their information. Use the remove button to request author removal, or add a new author.
PDF Attachment
You may attach a PDF as a corrected version of the paper. Max file size: 10MB. Only PDF files are accepted.
Your Information
Author Declaration *
Select at least one field to correct using the edit buttons above.