The limit order book (LOB) depicts the fine-grained demand and supply relationship for financial assets and is widely used in market microstructure studies. Nevertheless, the availability and high cost of LOB data restrict its wider application. The LOB recreation model (LOBRM) was recently proposed to bridge this gap by synthesizing the LOB from trades and quotes (TAQ) data. However, in the original LOBRM study, there were two limitations: (1) experiments were conducted on a relatively small dataset containing only one day of LOB data; and (2) the training and testing were performed in a non-chronological fashion, which essentially re-frames the task as interpolation and potentially introduces lookahead bias. In this study, we extend the research on LOBRM and further validate its use in real-world application scenarios. We first advance the workflow of LOBRM by (1) adding a time-weighted z-score standardization for the LOB and (2) substituting the ordinary differential equation kernel with an exponential decay kernel to lower computation complexity. Experiments are conducted on the extended LOBSTER dataset in a chronological fashion, as it would be used in a real-world application. We find that (1) LOBRM with decay kernel is superior to traditional non-linear models, and module ensembling is effective; (2) prediction accuracy is negatively related to the volatility of order volumes resting in the LOB; (3) the proposed sparse encoding method for TAQ exhibits good generalization ability and can facilitate manifold tasks; and (4) the influence of stochastic drift on prediction accuracy can be alleviated by increasing historical samples.
|Title of host publication||European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML-PKDD), Applied Data Science Track|
|Subtitle of host publication||Applied Data Science Track - European Conference, ECML PKDD 2021, Proceedings|
|Editors||Yuxiao Dong, Nicolas Kourtellis, Barbara Hammer, Jose A. Lozano|
|Number of pages||17|
|Publication status||Published - 10 Sep 2021|
|Event||European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML-PKDD 2021) - Virtual|
Duration: 13 Sep 2021 → 17 Sep 2021
|Name||Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)|
|Conference||European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML-PKDD 2021)|
|Abbreviated title||ECML-PKDD 2021|
|Period||13/09/21 → 17/09/21|
Bibliographical noteFunding Information:
Acknowledgements. Zijian Shi’s PhD is supported by a China Scholarship Council (CSC)/University of Bristol joint-funded scholarship. John Cartlidge is sponsored by Refinitiv.
© 2021, Springer Nature Switzerland AG.
- Limit Order Book
- Time series prediction
- Financial machine learning