Sequential decoding can achieve high throughput convolutional decoding with much lower computational complexity when compared with the Viterbi algorithm (VA) at a relatively high signal-to-noise ratio (SNR). A parallel bidirectional Fano algorithm (BFA) decoding architecture is investigated in this paper. In order to increase the utilisation of the parallel BFA decoders, and thus improve the decoding throughput, a state estimation method is proposed which can effectively partition a long codeword into multiple short sub-codewords. The parallel BFA decoding with state estimation architecture is shown to achieve 30-55% decoding throughput improvement compared with the parallel BFA decoding scheme without state estimation. Compared with the VA, the parallel BFA decoding only requires 3-30% computational complexity of that required by the VA with a similar error rate performance.