Inter-speaker speech rate normalization for phone duration modeling of Lithuanian
Author | Affiliation | |
---|---|---|
LT | ||
Date |
---|
2007 |
Previous research on CART based phone duration modeling of Lithuanian showed that results of model built on many speaker data are poorer than results of model built on single speaker data. It points out that inter-speaker normalization has to be done in order to use many speaker corpuses for training duration models. Three steps procedure was applied for speech rate normalization: 1. Correlation based clustering of vectors of duration averages calculated for each speaker was employed to identify language specific groups of phones; 2. Calculation of speech rate coefficients, one coefficient for every speaker and every group of phones; 3. Data normalization according to calculated coefficients. Experiments were performed on VDU-AB20 corpus which contain 300 thousand samples of vowels and 400 thousand samples of consonants and evaluated on CART based duration modelling. The achieved results of model built on many speaker data after inter speaker normalization (corr.: 0.8603 and 0.787; RMSE: 0.0228 and 0.0179 respectively for vowels and consonants) where better than results of model built on single speaker data.