Improved Infilling of Missing Metadata from Expendable Bathythermographs (XBTs) Using Multiple Machine Learning Methods

Stephen Haddad*, Rachel E. Killick, Matthew D. Palmer, Mark J. Webb, Rachel Prudden, Francesco Capponi, Samantha V. Adams

*Corresponding author for this work

Research output: Contribution to journalArticle (Academic Journal)peer-review

3 Citations (Scopus)

Abstract

Historical in situ ocean temperature profile measurements are important for a wide range of ocean and climate research activities. A large proportion of the profile observations have been recorded using expendable bathyther-mographs (XBTs), and required bias corrections for use in climate change studies. It is generally accepted that the bias, and therefore bias correction, depends on the type of XBT used. However, poor historical metadata collection practices mean the XBT probe type information is often missing, for 59% of profiles between 1967 and 2000, limiting the development of reliable bias corrections. We develop a process of estimating missing instrument type metadata (the combination of both model and manufacturer) systematically, constructing a machine learning pipeline based on thorough data exploration to inform these choices. The predicted instrument type, where missing, will facilitate improved XBT bias corrections. The new approach improves the accuracy of the XBT type classification compared to previous approaches from a recall value of 0.75–0.94. We also develop an approach to account for the uncertainty associated with metadata assignments using ensembles of decision trees, which could feed into an ensemble approach to creating ocean temperature data-sets. We describe the challenges arising from the nature of the dataset in applying standard machine learning techniques to the problem. We have implemented this in a portable, reproducible way using standard data science tools, with a view to these techniques being applied to other similar problems in climate science.

Original languageEnglish
Pages (from-to)1367-1385
Number of pages19
JournalJournal of Atmospheric and Oceanic Technology
Volume39
Issue number9
DOIs
Publication statusPublished - Sept 2022

Bibliographical note

Funding Information:
Acknowledgments. Mark Webb was supported by the Met Office Hadley Centre Climate Programme funded by BEIS and Defra.

Publisher Copyright:
© 2022, American Meteorological Society. All rights reserved.

Keywords

  • Classification
  • Data quality control
  • Data science
  • Decision trees
  • Machine learning
  • Ocean
  • Profilers, oceanic
  • Software

Fingerprint

Dive into the research topics of 'Improved Infilling of Missing Metadata from Expendable Bathythermographs (XBTs) Using Multiple Machine Learning Methods'. Together they form a unique fingerprint.

Cite this