| Abstract |
| To enhance the naturalness and emotional resonance of virtual characters in real-time human-computer dialogue, this study proposes a speech-driven framework for compound emotional digital-human interaction. The system first employs a speech emotion recognition module to extract affective features from the user’s voice, followed by a fine-grained compound emotion weight analysis using a GPT-based model. The results are structured in JSON format and transmitted via a local API to the Unreal Engine 5 rendering environment, enabling dynamic mapping from speech parameters to MetaHuman facial action units. To evaluate the system’s effectiveness, 40 participants rated four perceptual dimensions: naturalness and realism, effectiveness of compound emotion expression, emotional resonance, and overall interaction performance. Findings reveal that all four dimensions scored significantly higher than the neutral level (p < 0.001), with Cronbach’s α exceeding 0.70, indicating good internal consistency. Moreover, large effect sizes (Cohen’s d > 0.8) demonstrate the system’s considerable advantages in emotional expressiveness and interaction fluency. Overall, this framework achieves cross-modal emotional transmission through speech-driven compound emotion generation, providing an extensible technical pathway for future research in affective computing and digital-human interaction. |
|
|
| Key Words |
| Affective Engineering, Affective Computing, Experimental Design, VR, Human Factors Engineering, 감성공학, 감성컴퓨팅, 실험설계, 가상현실, 인간공학 |
|
|
 |
|