Aligning Multimodal Large Language Models with Human Preferences using MM-RLHF

Multimodal Large Language Models Aligned with Human Preferences: MM-RLHF

Multimodal Large Language Models (MLLMs) have made impressive progress in recent years. They can understand images, generate text, and even handle more complex tasks that require both visual and linguistic information. Despite these advancements, most current MLLMs lack thorough alignment with human preferences. Previous research on language model alignment has mainly focused on specific areas like reducing hallucinations. The broader question of whether aligning models with human preferences can systematically improve the capabilities of MLLMs has remained largely unanswered.

A research team has addressed this challenge and developed MM-RLHF, a dataset of 120,000 finely tuned, human-annotated comparison pairs of preferences. This dataset represents a significant advance over existing resources, offering a larger amount of data, higher diversity, more detailed annotations, and improved quality. Using this dataset, several innovations were introduced to improve both the quality of evaluation models and the efficiency of alignment algorithms.

A Critique-Based Evaluation Model

One of the key innovations of MM-RLHF is the introduction of a critique-based evaluation model. Instead of simply assigning scalar ratings, this model first generates critiques of the model's outputs before assigning scores. This approach offers higher interpretability and provides more informative feedback compared to traditional scalar evaluation mechanisms. The critiques help the model better understand the strengths and weaknesses of its responses and adapt more effectively to human preferences.

Dynamic Rating Scaling

Another important innovation is dynamic rating scaling. This method adjusts the loss weighting of each sample according to the evaluation signal. This optimizes the use of high-quality comparison pairs and improves the model's learning process. Dynamic scaling allows the model to focus more on the most informative data, thus learning faster and more effectively.

Evaluation and Results

The MM-RLHF approach was rigorously evaluated across 10 different dimensions and 27 benchmarks. The results show significant and consistent improvements in model performance. In particular, fine-tuning LLaVA-ov-7B with MM-RLHF and the new alignment algorithm led to a 19.5% increase in conversational abilities and a 60% improvement in safety.

The researchers have made the preference dataset, the evaluation model, the training and evaluation code, and benchmarks for model evaluation and safety publicly available. This allows other researchers to build on these results and further advance the development of MLLMs.

The development of MM-RLHF represents a significant step towards better aligning MLLMs with human preferences. The combination of a comprehensive dataset, a critique-based evaluation model, and dynamic rating scaling allows for comprehensive improvements in the capabilities of MLLMs and makes them usable for a wider range of applications. Future research will focus on further refining these approaches and making human-computer interaction even more seamless.

Bibliography: https://huggingface.co/papers/2502.10391 https://huggingface.co/papers https://papers.cool/arxiv/cs.CL https://arxiv.org/pdf/2411.14432? https://aclanthology.org/2024.findings-acl.775.pdf https://openreview.net/pdf/dfb3ff433f662041508bf2dc184f9f07e933bc53.pdf https://arxiv.org/abs/2309.14525 https://neurips.cc/virtual/2023/papers.html https://icml.cc/Downloads/2024 https://neurips.cc/virtual/2024/events/datasets-benchmarks-2024