Personalized Videos: DreamRelation Enables Relationship-Centric Video Creation
The creation of personalized videos that depict user-defined relationships between two subjects is a crucial step towards understanding visual content. While existing methods already allow for the personalization of the subjects' appearance and movements, they still face challenges in complex, relationship-centric video creation. Precise relationship modeling and high generalization across different subject categories are essential here.
The difficulty lies in the complex spatial arrangements, layout variations, and nuanced temporal dynamics inherent in relationships. Current models tend to overemphasize irrelevant visual details instead of capturing the meaningful interactions.
DreamRelation: A New Approach
To address these challenges, DreamRelation was developed, a novel approach that personalizes relationships based on a few exemplary videos. Two key components form the foundation: Relational Decoupling Learning and Relational Dynamics Enhancement.
Relational Decoupling Learning
In Relational Decoupling Learning, relationships are decoupled from the appearance of the subjects. This is achieved by using a Relation-LoRA triplet and a hybrid masking strategy during training. This ensures better generalization across different relationships. The developers analyzed the different roles of the query, key, and value features within the MM-DiT attention mechanism to determine the optimal design of the Relation-LoRA triplet. This makes DreamRelation the first framework for relational video generation with explainable components.
Relational Dynamics Enhancement
Relational Dynamics Enhancement introduces a spatio-temporal relational contrastive loss. This prioritizes the relationship dynamics while minimizing the dependence on detailed appearances of the subjects.
Convincing Results
Extensive experiments show that DreamRelation outperforms existing methods in relationship-centric video creation. The technology promises to significantly expand the possibilities of personalized video production and open up new application areas in fields such as entertainment, education, and communication. The code and models are to be made publicly available, which will promote further research and development in this area.
Bibliography:
- Wei, Yujie et al. “DreamRelation: Relation-Centric Video Customization.” arXiv preprint arXiv:2503.07602 (2025).
- Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 2025.
- Schmukalla, Kevin. “From textual descriptions to computer graphics: using deep learning to bridge the gap between language and visual imagery.” PhD diss., Birkbeck, University of London, 2023.
- Elman, Jeffrey L. “Finding structure in time.” Cognitive science 14.2 (1990): 179-211.
- Science.gov. "Early Human Occupation". [Online] Available: https://www.science.gov/topicpages/e/early+human+occupation [Accessed 27 October 2023].