Neural Network-assisted Analysis of Tube Vocal Tract Models

Proceedings of the Fifteenth Language Resources and Evaluation Conference (LREC 2026)

Abstract

We present a pipeline for deep neural network assisted modeling and analysis of the behavior of an acoustic tube. The vocal tract is represented as a series of cylindrical tube segments, each characterized by fixed length and variable cross-sectional area. A large synthetic dataset of such tube configurations is generated, and a circuit theory–based algorithm predicts corresponding formant frequencies. To explore mapping between vocal tract shapes and formant values, the pipeline integrates both linear regression and nonlinear machine learning models - including multilayer perceptrons. Model interpretability is measured using Shapley Additive Explanations (SHAP), which quantifies the contribution of each segment to predicted formant frequencies. The proposed framework enables detailed exploration of the articulatory-acoustic relationships inherent to an acoustic tube and vocal tract simulacrum. We present and describe the pipeline in the context of modeling effects of perturbations on the first three formants for a 16-cm tube, divided into 1 cm segments. Our pipeline can be applied to any method that models predictions of behavior of an acoustic tube, where the tube is conceived as a series of segmented units.