Simple Effective Methods for Decision-Level Fusion in Two-Stream Convolutional Neural Networks for Video Classification

Savran Kiziltepe R., Gan J. Q.

21th International Conference on Intelligent Data Engineering and Automated Learning, IDEAL 2020, Guimaraes, Portugal, 4 - 06 November 2020, vol.12489 LNCS, pp.77-87 identifier

  • Publication Type: Conference Paper / Full Text
  • Volume: 12489 LNCS
  • Doi Number: 10.1007/978-3-030-62362-3_8
  • City: Guimaraes
  • Country: Portugal
  • Page Numbers: pp.77-87
  • Keywords: Action recognition, Convolutional neural networks, Decision fusion, Deep learning, Video classification
  • Karadeniz Technical University Affiliated: No


© 2020, Springer Nature Switzerland AG.Convolutional Neural Networks (CNNs) have recently been applied for video classification applications where various methods for combining the appearance (spatial) and motion (temporal) information from video clips are considered. The most common method for combining the spatial and temporal information for video classification is averaging prediction scores at softmax layer. Inspired by the Mycin uncertainty system for combining production rules in expert systems, this paper proposes using the Mycin formula for decision fusion in two-stream convolutional neural networks. Based on the intuition that spatial information is more useful than temporal information for video classification, this paper also proposes multiplication and asymmetrical multiplication for decision fusion, aiming to better combine the spatial and temporal information for video classification using two-stream convolutional neural networks. The experimental results show that (i) both spatial and temporal information are important, but the decision from the spatial stream should be dominating with the decision from temporal stream as complementary and (ii) the proposed asymmetrical multiplication method for decision fusion significantly outperforms the Mycin method and average method as well.