|
オオノ スミオ
Ohno, Sumio
大野 澄雄 所属 コンピュータサイエンス学部 コンピュータサイエンス学科 職種 教授 |
|
| 言語種別 | 英語 |
| 発行・発表の年月 | 2025/01 |
| 形態種別 | 学術論文 |
| 査読 | 査読あり |
| 標題 | Determining the base frequency of the F0 contour generation model for the diverse expression of speech |
| 執筆形態 | 共著 |
| 掲載誌名 | Acoustical Science and Technology |
| 掲載区分 | 国内 |
| 出版社・発行元 | Acoustical Society of Japan |
| 巻・号・頁 | 46(1),pp.78-86 |
| 総ページ数 | 9 |
| 担当区分 | 最終著者 |
| 著者・共著者 | Yoshiko Arimoto, Yasuo Horiuchi, Sumio Ohno |
| 概要 | A reliable method of determining the base frequency (Fb) for utterances of various speaking styles is critical to enabling stable command labeling in the Fujisaki model. To achieve stable command labeling for diverse expressions of speech, a linear fitted model was developed using the ten percentile F0 of each utterance from three corpora of various speaking styles (read, acted, and spontaneous) as the independent variable to estimate a consistent Fb for each utterance. To assess the robustness of the model for unknown utterances, the model was applied to test data, including both open and corpus-open data not used for the model development, and the difference between the estimated Fb and the trained labelers’ annotated Fb was calculated. As a result, the obtained estimation model was found to fit well to the manually labeled Fbs by exhibiting a small root mean squared error (RMSE) of 0.096 and a high coefficient of determination (R2) of 0.89 for the closed dataset. Moreover, the model also exhibited a small RMSE of 0.091 and a high R2 of 0.92 for the corpus-open dataset. The results revealed that the proposed model can reliably estimate the Fb of utterances with various speaking styles. |