Journal of System Simulation
Abstract
Abstract: Full-body co-speech gesture generation significantly enhances the interactivity of virtual digital humans, requiring generated gestures to not only align accurately with speech but also demonstrate realistic full-body dynamics. To address limitations of existing methods—Transformer-based approaches often overlook temporal features of action sequences, while diffusion model-based ones inadequately capture spatial correlations between body parts, a full-body action generation method integrating diffusion models, Mamba, and attention mechanisms is proposed. We introduce the spatial self-attention-temporal state space model (STMamba Layer) as the core of denoising network to extract
inter-part spatial features and intra-part temporal features, thus enhancing action quality and diversity. Body motion sequences are modeled in two dimensions: spatially, rotational relative positional encoding and self-attention capture spatial correlations among body joint points; Mamba captures intra-part temporal dynamics in action sequences to boost continuity. Experiments and evaluations on the largescale audio-text-action dataset BEAT2 demonstrate that the proposed method outperforms state-of-the-art approaches in both fidelity and diversity, while maintaining competitive inference speed despite performance gains.
Recommended Citation
Zhang, Shuozhe; Song, Wenfeng; Hou, Xia; and Li, Shuai
(2026)
"Full-body Co-speech Gesture Generation Based on Spatial-temporal Enhanced Generation Model,"
Journal of System Simulation: Vol. 38:
Iss.
1, Article 16.
DOI: 10.16182/j.issn1004731x.joss.25-0833
Available at:
https://dc-china-simulation.researchcommons.org/journal/vol38/iss1/16
First Page
211
Last Page
224
CLC
TP.391.41
Recommended Citation
Zhang Shuozhe, Song Wenfeng, Hou Xia, et al. Full-body Co-speech Gesture Generation Based on Spatial-temporal Enhanced Generation Model[J]. Journal of System Simulation, 2026, 38(1): 211-224.
DOI
10.16182/j.issn1004731x.joss.25-0833
Included in
Artificial Intelligence and Robotics Commons, Computer Engineering Commons, Numerical Analysis and Scientific Computing Commons, Operations Research, Systems Engineering and Industrial Engineering Commons, Systems Science Commons