TY - JOUR
T1 - Financial data science: the birth of a new financial research paradigm complementing econometrics?
AU - Brooks, Chris
AU - Hoepner, Andreas G. F.
AU - McMillan, David
AU - Vivian, Andrew
AU - Simen, Chardin Wese
PY - 2019/3/1
Y1 - 2019/3/1
N2 - Financial data science and econometrics are highly complementary. They share an equivalent research process with the former's intellectual point of departure being statistical inference and the latter's being the data sets themselves. Two challenges arise, however, from digitalisation. First, the ever-increasing computational power allows researchers to experiment with an extremely large number of generated test subjects (i.e. p-hacking). We argue that p-hacking can be mitigated through adjustments for multiple hypothesis testing where appropriate. However, it can only truly be addressed via a strong focus on integrity (e.g. pre-registration, actual out-of-sample periods). Second, the extremely large number of observations available in big data set provides magnitudes of statistical power at which common statistical significance levels are barely relevant. This challenge can be addressed twofold. First, researchers can use more stringent statistical significance levels such as 0.1.5 respectively. Second, and more importantly, researchers can use criteria such as economic significance, economic relevance and statistical relevance to assess the robustness of statistically significant coefficients. Especially statistical relevance seems crucial, as it appears far from impossible for an individual coefficient to be considered statistically significant when its actual statistical relevance (i.e. incremental explanatory power) is extremely small.
AB - Financial data science and econometrics are highly complementary. They share an equivalent research process with the former's intellectual point of departure being statistical inference and the latter's being the data sets themselves. Two challenges arise, however, from digitalisation. First, the ever-increasing computational power allows researchers to experiment with an extremely large number of generated test subjects (i.e. p-hacking). We argue that p-hacking can be mitigated through adjustments for multiple hypothesis testing where appropriate. However, it can only truly be addressed via a strong focus on integrity (e.g. pre-registration, actual out-of-sample periods). Second, the extremely large number of observations available in big data set provides magnitudes of statistical power at which common statistical significance levels are barely relevant. This challenge can be addressed twofold. First, researchers can use more stringent statistical significance levels such as 0.1.5 respectively. Second, and more importantly, researchers can use criteria such as economic significance, economic relevance and statistical relevance to assess the robustness of statistically significant coefficients. Especially statistical relevance seems crucial, as it appears far from impossible for an individual coefficient to be considered statistically significant when its actual statistical relevance (i.e. incremental explanatory power) is extremely small.
U2 - 10.1080/1351847x.2019.1662822
DO - 10.1080/1351847x.2019.1662822
M3 - Article (Academic Journal)
SN - 1351-847X
VL - 25
SP - 1627
EP - 1636
JO - European Journal of Finance
JF - European Journal of Finance
IS - 17
ER -