TY - JOUR
T1 - On the Opportunities and Challenges of Foundation Models for GeoAI (Vision Paper)
AU - Mai, Gengchen
AU - Huang, Weiming
AU - Sun, Jin
AU - Song, Suhang
AU - Mishra, Deepak
AU - Liu, Ninghao
AU - Gao, Song
AU - Liu, Tianming
AU - Cong, Gao
AU - Hu, Yingjie
AU - Cundy, Chris
AU - Li, Ziyuan
AU - Zhu, Rui
AU - Lao, Ni
N1 - Publisher Copyright:
© 2024 Copyright held by the owner/author(s).
PY - 2024/7/1
Y1 - 2024/7/1
N2 - Large pre-Trained models, also known as foundation models (FMs), are trained in a task-Agnostic manner on large-scale data and can be adapted to a wide range of downstream tasks by fine-Tuning, few-shot, or even zero-shot learning. Despite their successes in language and vision tasks, we have not yet seen an attempt to develop foundation models for geospatial artificial intelligence (GeoAI). In this work, we explore the promises and challenges of developing multimodal foundation models for GeoAI. We first investigate the potential of many existing FMs by testing their performances on seven tasks across multiple geospatial domains, including Geospatial Semantics, Health Geography, Urban Geography, and Remote Sensing. Our results indicate that on several geospatial tasks that only involve text modality, such as toponym recognition, location description recognition, and US state-level/county-level dementia time series forecasting, the task-Agnostic large learning models (LLMs) can outperform task-specific fully supervised models in a zero-shot or few-shot learning setting. However, on other geospatial tasks, especially tasks that involve multiple data modalities (e.g., POI-based urban function classification, street view image-based urban noise intensity classification, and remote sensing image scene classification), existing FMs still underperform task-specific models. Based on these observations, we propose that one of the major challenges of developing an FM for GeoAI is to address the multimodal nature of geospatial tasks. After discussing the distinct challenges of each geospatial data modality, we suggest the possibility of a multimodal FM that can reason over various types of geospatial data through geospatial alignments. We conclude this article by discussing the unique risks and challenges to developing such a model for GeoAI.
AB - Large pre-Trained models, also known as foundation models (FMs), are trained in a task-Agnostic manner on large-scale data and can be adapted to a wide range of downstream tasks by fine-Tuning, few-shot, or even zero-shot learning. Despite their successes in language and vision tasks, we have not yet seen an attempt to develop foundation models for geospatial artificial intelligence (GeoAI). In this work, we explore the promises and challenges of developing multimodal foundation models for GeoAI. We first investigate the potential of many existing FMs by testing their performances on seven tasks across multiple geospatial domains, including Geospatial Semantics, Health Geography, Urban Geography, and Remote Sensing. Our results indicate that on several geospatial tasks that only involve text modality, such as toponym recognition, location description recognition, and US state-level/county-level dementia time series forecasting, the task-Agnostic large learning models (LLMs) can outperform task-specific fully supervised models in a zero-shot or few-shot learning setting. However, on other geospatial tasks, especially tasks that involve multiple data modalities (e.g., POI-based urban function classification, street view image-based urban noise intensity classification, and remote sensing image scene classification), existing FMs still underperform task-specific models. Based on these observations, we propose that one of the major challenges of developing an FM for GeoAI is to address the multimodal nature of geospatial tasks. After discussing the distinct challenges of each geospatial data modality, we suggest the possibility of a multimodal FM that can reason over various types of geospatial data through geospatial alignments. We conclude this article by discussing the unique risks and challenges to developing such a model for GeoAI.
KW - Foundation models
KW - geospatial artificial intelligence
KW - multimodal learning
UR - http://www.scopus.com/inward/record.url?scp=85193724004&partnerID=8YFLogxK
U2 - 10.1145/3653070
DO - 10.1145/3653070
M3 - Article (Academic Journal)
AN - SCOPUS:85193724004
SN - 2374-0353
VL - 10
JO - ACM Transactions on Spatial Algorithms and Systems
JF - ACM Transactions on Spatial Algorithms and Systems
IS - 2
M1 - 11
ER -