Abstract
Root-cause analysis (RCA) in large-scale microservice-based payment systems is challenging due to complex failure propagation along service dependencies, limited availability of labeled incident data, and heterogeneous service topologies across deployments. We propose ServiceGraph-FM, a pretrained graph-based model for RCA, where “foundation” denotes a self-supervised graph encoder pretrained on large-scale production cluster traces and then adapted to downstream diagnosis. ServiceGraph-FM introduces three components: (1) masked graph autoencoding pretraining to learn transferable service-dependency embeddings for cross-topology generalization; (2) a temporal relational diffusion module that models anomaly propagation as graph diffusion on dynamic service graphs (i.e., Laplacian-governed information flow with learnable edge propagation strengths); and (3) a causal attention mechanism that leverages multi-hop path signals to better separate likely causes from correlated downstream effects. Experiments on the Alibaba Cluster Trace and synthetic PayPal-style topologies show that ServiceGraph-FM outperforms state-of-the-art baselines, improving Top-1 accuracy by 23.7% and Top-3 accuracy by 18.4% on average, and reducing mean time to detection by 31.2%. In zero-shot deployment on unseen architectures, the pretrained model retains 78.3% of its fully fine-tuned performance, indicating strong transferability for practical incident management.
| Original language | English |
|---|---|
| Article number | 236 |
| Number of pages | 24 |
| Journal | Mathematics |
| Volume | 14 |
| Issue number | 2 |
| Early online date | 8 Jan 2026 |
| DOIs | |
| Publication status | E-pub ahead of print - 8 Jan 2026 |
Bibliographical note
Publisher Copyright:© 2026 by the authors.
Keywords
- temporal graph networks
- root-cause analysis
- microservice architecture
- anomaly detection
- AIOps
- graph-based models
- 68T07
Fingerprint
Dive into the research topics of 'ServiceGraph-FM: A Graph-Based Model with Temporal Relational Diffusion for Root-Cause Analysis in Large-Scale Payment Service Systems'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver