Improving bag-of-poses with semi-temporal pose descriptors for skeleton-based action recognition

Agahian, Saeid; Negin, Farhood; KÖSE, CEMAL

doi:10.1007/s00371-018-1489-7

Improving bag-of-poses with semi-temporal pose descriptors for skeleton-based action recognition

Agahian S., Negin F., KÖSE C.

VISUAL COMPUTER, cilt.35, sa.4, ss.591-607, 2019 (SCI-Expanded)

Yayın Türü: Makale / Tam Makale
Cilt numarası: 35 Sayı: 4
Basım Tarihi: 2019
Doi Numarası: 10.1007/s00371-018-1489-7
Dergi Adı: VISUAL COMPUTER
Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus
Sayfa Sayıları: ss.591-607
Anahtar Kelimeler: Skeleton-based, 3D action recognition, Bag-of-words, Key poses, Extreme learning machine and RGB-D, EXTREME LEARNING-MACHINE, FEATURES, ENSEMBLE
Açık Arşiv Koleksiyonu: AVESİS Açık Erişim Koleksiyonu
Karadeniz Teknik Üniversitesi Adresli: Evet

Özet

Over the last few decades, human action recognition has become one of the most challenging tasks in the field of computer vision. Effortless and accurate extraction of 3D skeleton information has been recently achieved by means of economical depth sensors and state-of-the-art deep learning approaches. In this study, we introduce a novel bag-of-poses framework for action recognition using 3D skeleton data. Our assumption is that any action can be represented by a set of predefined spatiotemporal poses. The pose descriptor is composed of three parts. The first part is concatenation of the normalized coordinate of the skeleton joints. The second part is consisted of temporal displacement of the joints constructed with predefined temporal offset, and the third part is temporal displacement with the previous frame in the sequence. In order to generate the key poses, we apply K-means clustering over all the training pose descriptors of the dataset. SVM classifier is trained with the generated key poses to classify an action pose. Accordingly, every action in the dataset is encoded with key pose histograms. ELM classifier is used for action recognition due to its fast, accurate and reliable performance compared to the other classifiers. The proposed framework is validated with five publicly available benchmark 3D action datasets and achieved state-of-the-art results on three of the datasets and competitive results on the other two datasets compared to the other methods.