フクシマ エドワルド フミヒコ
  福島 E.文彦
   所属   工学部 機械工学科
   職種   教授
言語種別 英語
発行・発表の年月 2025/10
形態種別 学術論文
査読 査読あり
標題 Hetero-Logit Alignment and Ambiguity Fragment Self-Weighting for Speech Emotion Recognition in Human-Robot Interaction
執筆形態 共著
掲載誌名 IEEE Internet of Things Journal ( Early Access )
掲載区分国外
出版社・発行元 IEEE
国際共著 国際共著
著者・共著者 Cheng-Shan Jiang; Zhen-Tao Liu; Edwardo F. Fukushima.; Jinhua She
概要 Speech Emotion Recognition (SER) is pivotal for achieving empathetic and adaptive Human-Robot Interaction (HRI) within Internet of Things (IoT) ecosystems. However, conventional SER methods face computational inefficiency of large self-supervised speech models for HRI system deployment. Moreover, utterance-level emotion labels introduce segment-level ambiguities, degrading model training. To mitigate these limitations, we proposed the Multi-View Logit-based Heterogeneous Model Alignment (MVLogit-HMA) framework, which transfers knowledge from a large self-supervised Trunk (TKG) encoder to a lightweight Package (PKG) encoder through the joint alignment of instance-level contrastive guidance and class-level prototype relations in both the logit space and probability distribution, thereby harmonizing representations across heterogeneous model architectures. Simultaneously, we introduce Ambiguity Fragment Self-Weighting (AFSW) to dynamically down-weight unreliable segments and enforce discriminative separation between high- and low-ambiguity groups via an adaptive boundary loss. Comprehensive evaluations on the IEMOCAP and RAVDESS datasets confirm the superiority of our method, achieving weighted accuracy (WA) of 74.17% and 90.02%, and unweighted accuracy (UA) of 74.47% and 90.39%, respectively. Furthermore, a preliminary application in a real-world HRI scenario validates the practical viability of our approach.
外部リンクURL https://doi.org/10.1109/JIOT.2025.3620303