Skip to main navigation Skip to search Skip to main content

Geo-R1: Improving few-shot geospatial referring expression understanding with reinforcement fine-tuning

Zilun Zhang, Zian Guan, Tiancheng Zhao*, Haozhan Shen, Yuxiang Cai, Zhonggen Su, Yongheng Shang, Zhaojun Liu*, Jianwei Yin*, Xiang Li*

*Corresponding author for this work

Research output: Contribution to journalArticle (Academic Journal)peer-review

Abstract

Referring expression understanding in remote sensing poses unique challenges, as it requires reasoning over complex object–context relationships. While supervised fine-tuning (SFT) on multimodal large language models (MLLMs) achieves strong performance with massive labeled datasets, they struggle in data-scarce scenarios, leading to poor generalization. To address this limitation, we propose Geo-R1, a reasoning-centric reinforcement fine-tuning (RFT) paradigm for few-shot geospatial referring. Geo-R1 can generate explicit, interpretable reasoning chains that decompose referring expressions, and then leverage these rationales to localize target objects, which provides great interpretability. We validate Geo-R1 on three carefully designed few-shot geospatial referring benchmarks, where our model consistently and substantially outperforms SFT baselines. It also demonstrates strong cross-dataset generalization, highlighting its robustness. Code and data will be released at https://github.com/Geo-R1/geo-r1.
Original languageEnglish
Pages (from-to)113-129
Number of pages17
JournalISPRS Journal of Photogrammetry and Remote Sensing
Volume237
Early online date22 Apr 2026
DOIs
Publication statusE-pub ahead of print - 22 Apr 2026

Bibliographical note

Publisher Copyright:
© 2026 International Society for Photogrammetry and Remote Sensing, Inc. (ISPRS).

Fingerprint

Dive into the research topics of 'Geo-R1: Improving few-shot geospatial referring expression understanding with reinforcement fine-tuning'. Together they form a unique fingerprint.

Cite this