Abstract
In this work, we exploit multi-task learning to jointly predict the two decision-making processes of gaze movement and probe manipulation that an experienced sonographer would perform in routine obstetric scanning. A multimodal guidance framework, Multimodal-GuideNet, is proposed to detect the causal relationship between a real-world ultrasound video signal, synchronized gaze, and probe motion. The association between the multi-modality inputs is learned and shared through a modality-aware spatial graph that leverages useful cross-modal dependencies. By estimating the probability distribution of probe and gaze movements in real scans, the predicted guidance signals also allow inter- and intra-sonographer variations and avoid a fixed scanning path. We validate the new multi-modality approach on three types of obstetric scanning examinations, and the result consistently outperforms single-task learning under various guidance policies. To simulate sonographer's attention on multi-structure images, we also explore multi-step estimation in gaze guidance, and its visual results show that the prediction allows multiple gaze centers that are substantially aligned with underlying anatomical structures.
| Original language | English |
|---|---|
| Article number | 102981 |
| Number of pages | 10 |
| Journal | Medical Image Analysis |
| Volume | 90 |
| Early online date | 29 Sept 2023 |
| DOIs | |
| Publication status | Published - 1 Dec 2023 |
Bibliographical note
Publisher Copyright:© 2023 The Authors. Published by Elsevier B.V.
Keywords
- Fetal ultrasound
- Multi-task learning
- Multimodal representation learning
- Probe guidance