Large Vision-Language Model Alignment and Misalignment: A Survey Through the Lens of Explainability

Dong Shu, Haiyan Zhao, Jingyu Hu, Weiru Liu, Ali Payani, Lu Cheng, Mengnan Du*

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference Contribution (Conference Proceeding)

Abstract

Large Vision-Language Models (LVLMs) have demonstrated remarkable capabilities in processing both visual and textual information. However, the critical challenge of alignment between visual and textual representations is
not fully understood. This survey presents a comprehensive examination of alignment and misalignment in LVLMs through an explainability lens. We first examine the fundamentals of alignment, exploring its representational and behavioral aspects, training methodologies, and theoretical foundations. We then analyze misalignment phenomena across three semantic levels: object, attribute, and relational misalignment. Our investigation reveals that misalignment emerges from challenges at multiple levels: the data level, the model level, and
the inference level. We provide a comprehensive review of existing mitigation strategies, categorizing them into parameter-frozen and parameter-tuning approaches. Finally, we outline promising future research directions, emphasizing the need for standardized evaluation protocols and in-depth explainability studies.
Original languageEnglish
Title of host publicationFindings of the Association for Computational Linguistics: EMNLP 2025
EditorsChristos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
PublisherAssociation for Computational Linguistics
Pages1713-1735
Number of pages23
ISBN (Electronic)979-8-89176-332-6
DOIs
Publication statusPublished - 4 Nov 2025
EventThe 2025 Conference on Empirical Methods in Natural Language Processing - Suzhou, China
Duration: 4 Nov 20259 Nov 2025
https://2025.emnlp.org/

Conference

ConferenceThe 2025 Conference on Empirical Methods in Natural Language Processing
Abbreviated title EMNLP 2025
Country/TerritoryChina
CitySuzhou
Period4/11/259/11/25
Internet address

Research Groups and Themes

  • Intelligent Systems Laboratory

Fingerprint

Dive into the research topics of 'Large Vision-Language Model Alignment and Misalignment: A Survey Through the Lens of Explainability'. Together they form a unique fingerprint.

Cite this