HumanMM Enables 3D Human Motion Reconstruction from Multi-Shot Videos

From Multi-Shot Videos to Global Motion Capture: HumanMM

The reconstruction of three-dimensional human motion from videos is a complex research area with diverse applications, from the animation film industry to medical diagnostics. In particular, the capture of movements over longer sequences, known as "Long-Sequence 3D Human Motion Reconstruction," presents a special challenge. A new framework called HumanMM now promises to overcome this challenge by enabling motion reconstruction from multi-shot videos, i.e., videos with multiple camera perspectives and cuts.

The Challenges of Multi-Shot Reconstruction

Conventional motion capture methods mostly focus on single-shot videos, where the movement is recorded within a single camera setting. Multi-shot videos, on the other hand, offer richer information through different viewpoints but also introduce new difficulties. Abrupt scene changes, partial occlusions of the body, and dynamic backgrounds make continuous and accurate reconstruction of the movement difficult.

The difficulty lies in correctly aligning the different perspectives of the individual shots and creating a unified, global representation of the movement in three-dimensional space. Previous approaches often simplified this problem by aligning the individual shots only in camera space, which can lead to inaccuracies in the global motion reconstruction.

HumanMM: A New Approach

HumanMM takes an innovative approach by combining camera pose estimation with human motion recovery (HMR). An integrated scene change detector identifies the transitions between individual shots and thus enables robust alignment of the different perspectives. This ensures precise continuity of pose and orientation across all shots.

Another important component of HumanMM is a special motion integrator. This minimizes the problem of "foot sliding," a common artifact in motion reconstruction where the feet of the virtual model appear to slide on the ground. The integrator ensures improved temporal consistency of the reconstructed pose, contributing to a more realistic result.

Evaluation and Outlook

To demonstrate the performance of HumanMM, extensive tests were conducted on a custom-built multi-shot dataset compiled from publicly available 3D motion datasets. The results show that HumanMM is capable of reconstructing realistic human movements in world coordinates, even with complex multi-shot videos.

The developers of HumanMM see their system as an important building block for future markerless motion capture methods. The ability to extract complex movements from everyday videos opens up new perspectives for various application areas, including the development of realistic animations, the analysis of movement sequences in sports and medicine, and the creation of interactive virtual environments.

Bibliography: - Pavlakos, G., et al. "Human Mesh Recovery from Multiple Shots." CVPR 2022. - Zhang, Y., et al. "HumanMM: Global Human Motion Recovery from Multi-shot Videos." arXiv preprint arXiv:2503.07597 (2025).