Home  | Publications | BWY+25

PRISM: Self-Pruning Intrinsic Selection Method for Training-Free Multimodal Data Selection

MCML Authors

Abstract

Visual instruction tuning adapts pre-trained Multimodal Large Language Models (MLLMs) to follow human instructions for real-world applications. However, the rapid growth of these datasets introduces significant redundancy, leading to increased computational costs. Existing methods for selecting instruction data aim to prune this redundancy, but predominantly rely on computationally demanding techniques such as proxy-based inference or training-based metrics. Consequently, the substantial computational costs incurred by these selection processes often exacerbate the very efficiency bottlenecks they are intended to resolve, posing a significant challenge to the scalable and effective tuning of MLLMs. To address this challenge, we first identify a critical, yet previously overlooked, factor: the anisotropy inherent in visual feature distributions. We find that this anisotropy induces a textit{Global Semantic Drift}, and overlooking this phenomenon is a key factor limiting the efficiency of current data selection methods. Motivated by this insight, we devise textbf{PRISM}, the first training-free framework for efficient visual instruction selection. PRISM surgically removes the corrupting influence of global background features by modeling the intrinsic visual semantics via implicit re-centering. Empirically, PRISM reduces the end-to-end time for data selection and model tuning to just 30% of conventional pipelines. More remarkably, it achieves this efficiency while simultaneously enhancing performance, surpassing models fine-tuned on the full dataset across eight multimodal and three language understanding benchmarks, culminating in a 101.7% relative improvement over the baseline.

misc BWY+25


Preprint

Feb. 2025

Authors

J. Bi • Y. Wang • D. Yan •  Aniri • W. Huang • Z. Jin • X. Ma • A. Hecker • M. Ye • X. Xiao • H. SchützeV. TrespY. Ma

Links

arXiv GitHub

Research Areas

 A3 | Computational Models

 B2 | Natural Language Processing

BibTeXKey: BWY+25

Back to Top