Back to Main Conference 2024
LREC-COLING 2024main

Reduce Redundancy Then Rerank: Enhancing Code Summarization with a Novel Pipeline Framework

Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

DOI:10.63317/43cdx36yc6y3

Abstract

Code summarization is the task of automatically generating natural language descriptions from source code. Recently, pre-trained language models have gained significant popularity in code summarization due to their capacity to capture richer semantic representations of both code and natural language. Nonetheless, contemporary code summarization models grapple with two fundamental limitations. (1) Some tokens in the code are irrelevant to the natural language description and damage the alignment of the representation spaces for code and language. (2) Most approaches are based on the encoder-decoder framework, which is often plagued by the exposure bias problem, hampering the effectiveness of their decoding sampling strategies. To address the two challenges, we propose a novel pipeline framework named Reduce Redundancy then Rerank (Reˆ3). Specifically, a redundancy reduction component is introduced to eliminate redundant information in code representation space. Moreover, a re-ranking model is incorporated to select more suitable summary candidates, alleviating the exposure bias problem. The experimental results show the effectiveness of Reˆ3 over some state-of-the-art approaches across six different datasets from the CodeSearchNet benchmark.

Details

Paper ID
lrec2024-main-1198
Pages
pp. 13722-13733
BibKey
hu-etal-2024-reduce
Editor
N/A
Publisher
European Language Resources Association (ELRA) and ICCL
ISSN
2522-2686
ISBN
979-10-95546-34-4
Conference
Joint International Conference on Computational Linguistics, Language Resources and Evaluation
Location
Turin, Italy
Date
20 May 2024 25 May 2024

Authors

  • XH

    Xiaoyu Hu

  • XZ

    Xu Zhang

  • ZL

    Zexu Lin

  • DZ

    Deyu Zhou

Links