G&P2P: A Multi-Source Approach to Grapheme-to-Phoneme Conversion
Proceedings of the Third Workshop on Computation and Written Language (CAWL 2026) @ LREC 2026
Abstract
Grapheme-to-phoneme (G2P) conversion plays a central role in speech technologies. This paper introduces G&P2P, a multi-source framework that integrates multiple pronunciation dictionaries to enhance G2P modeling. We evaluate both expert-curated and crowd-sourced resources using attentive LSTM, pointer-generator LSTM, and transformer architectures. Results indicate that combining high-quality expert dictionaries yields substantial improvements, achieving an 11.26-point absolute (22% relative) reduction in word error rate. In contrast, incorporating noisy crowd-sourced resources may degrade performance. Statistical analyses further suggest that dataset quality exerts a greater influence on outcomes than the choice of fusion strategy, offering practical guidance for the design of multi-source G2P systems.