Abstract
The incorporation of language to enable model extension into unseen domains has gained significant interest in recent years. Previous methods commonly utilize semantically guided distributional shifts in training features to achieve this. Nevertheless, the intrinsic modal disparities between language and pixel-level images frequently result in a divergence within the feature manifold when employing semantic guidelines to augment features. This paper presents the IMbuing, Enrichment, and Calibration (IMEC) strategy as a concise solution for these issues. Unlike previous approaches, IMEC reverses the target domain style mining process to ensure the retention of semantic content within a more structured framework. Guided by global semantics, we conditionally generate style vectors for imbuing into visual features. After which IMEC introduces minor perturbations to disperse these vectors using local semantics and selectively calibrates semantic content in features through a dimensional activation strategy. IMEC integrates semantic abstract knowledge with detail image content, bridging the gap between synthetic and real samples in the target domain and mitigating content collapse resulting from semantic-visual disparities. Our model is evaluated on semantic segmentation, object detection, and image classification tasks across challenging datasets, demonstrating superior performance over existing methods in both the target and source domains. The code for IMEC is available at https://github.com/LanchJL/IMEC-ZSDE.
| Original language | English |
|---|---|
| Pages (from-to) | 4064-4090 |
| Number of pages | 27 |
| Journal | International Journal of Computer Vision (IJCV) |
| Volume | 133 |
| Issue number | 7 |
| Early online date | 20 Feb 2025 |
| DOIs | |
| Publication status | Published - 1 Jul 2025 |
Bibliographical note
Publisher Copyright:© The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2025.