A dynamic glottal model through high-speed imaging
基于高速数位成像的动态声门模型
Jiangping Kong 孔江平
Abstract 摘要
This paper is a study for an improved dynamic glottal model through high-speed imaging (HSI). As is well known, speech production comprises three parts, namely speech source, speech resonance and lip radiation. Among these three parts, speech source is the most important one because it is the basis of speech. In research on speech production, acoustical models of speech source have been well established. But the physiological speech source, that is to say, the activity of the glottis is seldom researched, because the vibration of the vocal folds is difficult to observe and sample. A study on the glottal model was established many years ago (Kong 2007), and in that model, the static glottis was modeled by four quarters of ellipses in three modes namely normal mode, leakage mode and open mode. The dynamic glottal control function was modeled by an approximation of multiplication of sine and exponential. The
problem of the dynamic glottal model is that the control parameters can’t be well explained, though the glottis can be simulated. In this study, more high-speed images were sampled, the image processing was greatly improved and the dynamic glottal control function was modeled with parameters which were significant to speech perception.
本文利用高速数位成像技术对动态声门模型进行了研究。众所周知,言语产生包括嗓音声源、声道共鸣和唇辐射三个方面,其中嗓音声源尤其重要,因为嗓音声源是言语产生的基础。在言语产生的研究中,声学模型已经有了很深入的研究,但由于声带振动难于观察和采集样本,嗓音的生理模型研究的很少。多年前作者建立了一个动态声门模型(Kong 2007),在此模型中,静态声门是用四个四分之一椭圆来建模的,并有正常、漏气和敞开四种模式。模型的动态声门控制函数是通过正弦和抛物线的乘积来建模。虽然这种方式有效,但合成嗓音的参数解释性较差。在本项研究中,采集了更多更高品质的声带振动高速数位成象样本和大大改进了数位影像处理的技术,最终模型的动态声门控制函数所用的参数对嗓音声源的感知具有很好的解释性。
Keywords 关键词
SHigh-speed imaging 高速数位成像 Vibration of vocal folds 声带振动 Dynamic glottal model 动态声门模型