Humanoid robotics has strong potential to transform daily service and caregiving applications. Although recent advances in general motion tracking within physics engines (GMT) have enabled virtual characters and humanoid robots to reproduce a broad range of human motions, these behaviors are primarily limited to contact-less social interactions or isolated movements. Assistive scenarios, by contrast, require continuous awareness of a human partner and rapid adaptation to their evolving posture and dynamics.
In this paper, we formulate the imitation of closely interacting, force-exchanging human–human motion sequences as a multi-agent reinforcement learning problem. We jointly train partner-aware policies for both the supporter (assistant) agent and the recipient agent in a physics simulator to track assistive motion references. To make this problem tractable, we introduce a partner policies initialization scheme that transfers priors from single-human motion-tracking controllers, greatly improving exploration. We further propose dynamic reference retargeting and contact-promoting reward, which adapt the assistant's reference motion to the recipient's real-time pose and encourage physically meaningful support.
We show that AssistMimic is the first method capable of successfully tracking assistive interaction motions on established benchmarks, demonstrating the benefits of a multi-agent RL formulation for physically grounded and socially aware humanoid control.
Overview of AssistMimic. We train tracking-based humanoid control policies for both the recipient and the supporter, optimizing them to imitate a paired reference motion sequence. Our architecture builds on the single-agent tracking framework of PHC, extending it with partner-aware state inputs and augmenting standard imitation rewards with recipient-aware reference retargeting and contact-incentivizing reward terms.
We model close-range assistance as a multi-agent MDP where both the supporter and recipient learn tracking policies jointly. This enables reactive and physically consistent coordination between agents.
We initialize both policies using parameters from a pre-trained single-person tracking controller, allowing them to inherit locomotion skills before acquiring role-specific behaviors.
We continuously adjust the assistant's hand targets to preserve their intended relative position with respect to the recipient's body, enabling adaptive support.
We introduce contact-incentivizing rewards when the recipient's body is in close proximity to the assistant's hands, prioritizing functional support over strict kinematic tracking.
Failure of Kinematic Baselines. The kinematic replay approach results in the recipient "standing up" independently regardless of support (left), or causes severe interpenetration triggering massive restorative forces (right). These failures confirm that a kinematic-replay paradigm is fundamentally inapplicable for assistive tasks.
| Method | Seen Dynamics | Unseen Dynamics | ||
|---|---|---|---|---|
| SR (%) | MPJPE (mm) | Mass ×1.2 | Kp/Kd ×0.5 | |
| Sequential Training | 62.4 | 92.3 | 49.9 | 50.5 |
| AssistMimic (Ours) | 74.9 | 113 | 57.9 | 72.8 |
| (-) Dynamic Retargeting | 83.4 | 107 | 73.1 | 83.3 |
| (-) Contact Reward | 81.6 | 80.4 | 66.3 | 77.1 |
| (-) Weight Init | 0.0 | 248 | 0.0 | 0.0 |
Evaluation of specialist policies on the Inter-X dataset.
| Method | Seen Dynamics | Unseen Dynamics | ||
|---|---|---|---|---|
| SR (%) | MPJPE (mm) | Mass ×1.5 | Max hip τ ×0.5 | |
| AssistMimic (Ours) | 85.8 | 127.0 | 67.8 | 73.2 |
| (-) Dynamic Retargeting | 85.4 | 125 | 49.1 | 62.9 |
| (-) Contact Reward | 97.7 | 89.5 | 56.4 | 27.7 |
| (-) Weight Init | 19.1† | 364† | - | - |
Evaluation of specialist policies on the HHI-Assist dataset. † denotes reward-hacking failures.
AssistMimic successfully tracks diverse assistive interactions including supporting from the front, back, or side; using both hands or a single hand; and assisting by grasping the arm or holding around the shoulder.
Qualitative evaluation on HHI-Assist dataset. The red boxes highlight whether the supporter's hands correctly adjust to the recipient's position and provide appropriate support.
AssistMimic successfully converts dense interaction kinematics generated by a motion diffusion model into physically plausible motions, demonstrating its generality and broad applicability.
AssistMimic generalizes to unseen interaction categories such as support-by-arm. Even though this interaction pattern is not observed during training, the humanoid successfully follows the motion while placing a supportive hand on the partner's arm.
We cast physics-based human–human motion imitation as a multi-agent reinforcement learning (MARL) framework, enabling motion imitation for caregiving and assistive tasks that require reactive force exchange.
We introduce motion prior initialization, dynamic reference retargeting, and contact-promoting reward schemes that make multi-agent RL tractable and effective in high-contact settings.
We conduct extensive experiments and ablations that quantify each component's contribution to learning efficiency, imitation fidelity, and assistive stability.
@inproceedings{shibata2026assistmimic,
title={Learning to Assist: Physics-Grounded Human-Human Control
via Multi-Agent Reinforcement Learning},
author={Shibata, Yuto and Yamazaki, Kashu and Jayanti, Lalit
and Aoki, Yoshimitsu and Isogawa, Mariko
and Fragkiadaki, Katerina},
booktitle={Proceedings of the IEEE/CVF Conference on
Computer Vision and Pattern Recognition (CVPR)},
year={2026}
}