AssistMimic: Learning to Assist

Physics-Grounded Human-Human Control via Multi-Agent Reinforcement Learning

CVPR 2026
1Carnegie Mellon University 2Keio AI Research Center 3Keio University
AssistMimic Teaser

AssistMimic proposes a multi-agent RL framework capable of learning robust Supporter and Recipient policies from noisy, close-proximity motion sequences. By leveraging single-person motion priors, a novel recipient-adaptive reference retargeting mechanism, and contact-promoting rewards, AssistMimic becomes the first physics-based controller to successfully track such complex, high-contact reference motions.

Abstract

Humanoid robotics has strong potential to transform daily service and caregiving applications. Although recent advances in general motion tracking within physics engines (GMT) have enabled virtual characters and humanoid robots to reproduce a broad range of human motions, these behaviors are primarily limited to contact-less social interactions or isolated movements. Assistive scenarios, by contrast, require continuous awareness of a human partner and rapid adaptation to their evolving posture and dynamics.

In this paper, we formulate the imitation of closely interacting, force-exchanging human–human motion sequences as a multi-agent reinforcement learning problem. We jointly train partner-aware policies for both the supporter (assistant) agent and the recipient agent in a physics simulator to track assistive motion references. To make this problem tractable, we introduce a partner policies initialization scheme that transfers priors from single-human motion-tracking controllers, greatly improving exploration. We further propose dynamic reference retargeting and contact-promoting reward, which adapt the assistant's reference motion to the recipient's real-time pose and encourage physically meaningful support.

We show that AssistMimic is the first method capable of successfully tracking assistive interaction motions on established benchmarks, demonstrating the benefits of a multi-agent RL formulation for physically grounded and socially aware humanoid control.

Method

Method Overview

Overview of AssistMimic. We train tracking-based humanoid control policies for both the recipient and the supporter, optimizing them to imitate a paired reference motion sequence. Our architecture builds on the single-agent tracking framework of PHC, extending it with partner-aware state inputs and augmenting standard imitation rewards with recipient-aware reference retargeting and contact-incentivizing reward terms.

Multi-Agent RL Formulation

We model close-range assistance as a multi-agent MDP where both the supporter and recipient learn tracking policies jointly. This enables reactive and physically consistent coordination between agents.

Motion Prior Initialization

We initialize both policies using parameters from a pre-trained single-person tracking controller, allowing them to inherit locomotion skills before acquiring role-specific behaviors.

Dynamic Reference Retargeting

We continuously adjust the assistant's hand targets to preserve their intended relative position with respect to the recipient's body, enabling adaptive support.

Contact-Promoting Rewards

We introduce contact-incentivizing rewards when the recipient's body is in close proximity to the assistant's hands, prioritizing functional support over strict kinematic tracking.

Results

Why Multi-Agent RL?

Kinematic Replay Failure

Failure of Kinematic Baselines. The kinematic replay approach results in the recipient "standing up" independently regardless of support (left), or causes severe interpenetration triggering massive restorative forces (right). These failures confirm that a kinematic-replay paradigm is fundamentally inapplicable for assistive tasks.

Quantitative Evaluation

Method Seen Dynamics Unseen Dynamics
SR (%) MPJPE (mm) Mass ×1.2 Kp/Kd ×0.5
Sequential Training 62.4 92.3 49.9 50.5
AssistMimic (Ours) 74.9 113 57.9 72.8
(-) Dynamic Retargeting 83.4 107 73.1 83.3
  (-) Contact Reward 81.6 80.4 66.3 77.1
    (-) Weight Init 0.0 248 0.0 0.0

Evaluation of specialist policies on the Inter-X dataset.

Method Seen Dynamics Unseen Dynamics
SR (%) MPJPE (mm) Mass ×1.5 Max hip τ ×0.5
AssistMimic (Ours) 85.8 127.0 67.8 73.2
(-) Dynamic Retargeting 85.4 125 49.1 62.9
  (-) Contact Reward 97.7 89.5 56.4 27.7
    (-) Weight Init 19.1 364 - -

Evaluation of specialist policies on the HHI-Assist dataset. denotes reward-hacking failures.

Qualitative Results on Inter-X

Inter-X Qualitative Results

AssistMimic successfully tracks diverse assistive interactions including supporting from the front, back, or side; using both hands or a single hand; and assisting by grasping the arm or holding around the shoulder.

Qualitative Results on HHI-Assist

HHI-Assist Qualitative Results

Qualitative evaluation on HHI-Assist dataset. The red boxes highlight whether the supporter's hands correctly adjust to the recipient's position and provide appropriate support.

Tracking Generated Interactions

Generated Motion Tracking

AssistMimic successfully converts dense interaction kinematics generated by a motion diffusion model into physically plausible motions, demonstrating its generality and broad applicability.

Generalization to Unseen Actions

Generalization Results

AssistMimic generalizes to unseen interaction categories such as support-by-arm. Even though this interaction pattern is not observed during training, the humanoid successfully follows the motion while placing a supportive hand on the partner's arm.

Key Contributions

Formulation

We cast physics-based human–human motion imitation as a multi-agent reinforcement learning (MARL) framework, enabling motion imitation for caregiving and assistive tasks that require reactive force exchange.

Methodology

We introduce motion prior initialization, dynamic reference retargeting, and contact-promoting reward schemes that make multi-agent RL tractable and effective in high-contact settings.

Evaluation

We conduct extensive experiments and ablations that quantify each component's contribution to learning efficiency, imitation fidelity, and assistive stability.

BibTeX

@inproceedings{shibata2026assistmimic,
    title={Learning to Assist: Physics-Grounded Human-Human Control
           via Multi-Agent Reinforcement Learning},
    author={Shibata, Yuto and Yamazaki, Kashu and Jayanti, Lalit
            and Aoki, Yoshimitsu and Isogawa, Mariko
            and Fragkiadaki, Katerina},
    booktitle={Proceedings of the IEEE/CVF Conference on
               Computer Vision and Pattern Recognition (CVPR)},
    year={2026}
}