15th International Work-Conference on Artificial Neural Networks (IWANN), İspanya, 12 - 14 Haziran 2019, cilt.11507, ss.811-822
Convolutional Neural Networks (CNNs) have been demonstrated to be able to produce the best performance in image classification problems. Recurrent Neural Networks (RNNs) have been utilized to make use of temporal information for time series classification. The main goal of this paper is to examine how temporal information between frame sequences can be used to improve the performance of video classification using RNNs. Using transfer learning, this paper presents a comparative study of seven video classification network architectures, which utilize either global or local features extracted by VGG-16, a very deep CNN pre-trained for image classification. Hold-out validation has been used to optimize the ratio of dropout and the number of units in the fully-connected layers in the proposed architectures. Each network architecture for video classification has been executed a number of times using different data splits, with the best architecture identified using the independent T-test. Experimental results show that the network architecture using local features extracted by the pre-trained CNN and ConvLSTM for making use of temporal information can achieve the best accuracy in video classification.