Convolutional and Recurrent Neural Network for Human Action Recognition: application on American Sign Language

Hernandez Vincent; Suzuki Tomoya; Venture Gentiane

doi:10.1101/535492

Abstract

Human Action Recognition (HAR) is an important and difficult topic because of the important variability between tasks repeated several times by a subject and between subjects. This work is motivated by providing time-series signal classification and a robust validation and test approaches. This study proposes to classify 60 American Sign Language signs from data provided by the LeapMotion sensor by using a combined approach with Convolutional Neural Network (ConvNet) and Recurrent Neural Network with Long-Short Term Memory cells (LSTM) called ConvNet-LSTM. Moreover, a complete kinematic model of the right and left forearm/hand/fingers/thumb is proposed as well as the use of a simple data augmentation technique to improve the generalization of neural networks. Results showed an accuracy of 89.3% on a user-independent test set with data augmentation when using the ConvNet-LSTM, while LSTM alone provided an accuracy of 85.0% on the same test set. The result dropped respectively to 85.9% and 81.4% without data augmentation.

The copyright holder for this preprint is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY 4.0 International license.