オオノ スミオ   Ohno, Sumio
  大野 澄雄
   所属   コンピュータサイエンス学部 コンピュータサイエンス学科
   職種   教授
言語種別 英語
発行・発表の年月 2025/01
形態種別 学術論文
査読 査読あり
標題 Determining the base frequency of the F0 contour generation model for the diverse expression of speech
執筆形態 共著
掲載誌名 Acoustical Science and Technology
掲載区分国内
出版社・発行元 Acoustical Society of Japan
巻・号・頁 46(1),pp.78-86
総ページ数 9
担当区分 最終著者
著者・共著者 Yoshiko Arimoto, Yasuo Horiuchi, Sumio Ohno
概要 A reliable method of determining the base frequency (Fb) for utterances of various speaking styles is critical to enabling stable command labeling in the Fujisaki model. To achieve stable command labeling for diverse expressions of speech, a linear fitted model was developed using the ten percentile F0 of each utterance from three corpora of various speaking styles (read, acted, and spontaneous) as the independent variable to estimate a consistent Fb for each utterance. To assess the robustness of the model for unknown utterances, the model was applied to test data, including both open and corpus-open data not used for the model development, and the difference between the estimated Fb and the trained labelers’ annotated Fb was calculated. As a result, the obtained estimation model was found to fit well to the manually labeled Fbs by exhibiting a small root mean squared error (RMSE) of 0.096 and a high coefficient of determination (R2) of 0.89 for the closed dataset. Moreover, the model also exhibited a small RMSE of 0.091 and a high R2 of 0.92 for the corpus-open dataset. The results revealed that the proposed model can reliably estimate the Fb of utterances with various speaking styles.