Team Work

HUMAN ACTION RECOGNITION FROM DEPTH MAPS AND POSTURES USING DEEP LEARNING

ABSTRACT :

In this paper, we present a method (Action-Fusion) for human action recognition from depth maps and posture data using convolutional neural networks (CNNs). Two input descriptors are used for action representation. The first input is a depth motion image that accumulates consecutive depth maps of a human action, whilst the second input is a proposed moving joints descriptor which represents the motion of body joints over time. In order to maximize feature extraction for accurate action classification, three CNN channels are trained with different inputs. The first channel is trained with depth motion images (DMIs), the second channel is trained with both DMIs and moving joint descriptors together, and the third channel is trained with moving joint descriptors only. The action predictions generated from the three CNN channels are fused together for the final action classification. We propose several fusion score operations to maximize the score of the right action. The experiments show that the results of fusing the output of three channels are better than using one channel or fusing two channels only. Our proposed method was evaluated on three public datasets: 1) Microsoft action 3-D dataset (MSRAction3D); 2) University of Texas at Dallas-multimodal human action dataset; and 3) multimodal action dataset (MAD) dataset. The testing results indicate that the proposed approach outperforms most of existing state-of-the-art methods, such as histogram of oriented 4-D normals and Actionlet on MSRAction3D. Although MAD dataset contains a high number of actions (35 actions) compared to existing action RGB-D datasets, this paper surpasses a state-of-the-art method on the dataset by 6.84%.

EXISTING SYSTEM :

HUMAN action recognition is necessary for various computer vision applications that demand information about people’s behavior, including surveillance for public safety, human–computer interaction applications, and robotics [1]–[6]. There are a variety of human action recognition systems, such as video-based human action recognition [7]–[10], wearable sensor-based human action recognition [11]–[15], wireless sensor network-based human action recognition [16], [17], etc. Among these studies, due to high recognition accuracy and easy deployment, video-based human action recognition techniques have got more research attention and been widely applied into lots of industrial applications.

PROPOSED SYSTEM :

The action recognition process introduced in this paper involves three CNN channels trained with DMI and MJD descriptors for feature extraction and classification. The first channel is trained with DMI, the second channel is a connection between two subchannels. One subchannel is trained with DMI and the other subchannel is trained with MJD. The third channel is trained with MJD only. Each channel generates its own scores for the actions. Our experiments reported that taking the maximum score value of the three CNN channels leads to low accuracy prediction on the testing data. In order to maximize the score of the right action, five score operations are proposed and analyzed to select the best operation that can predict the right action accurately. In general, the proposed approach generates three outputs from the CNN channels and five other outputs produced by fusion operations between the three channels. The maximum action score value of all the outputs is considered as the final action prediction result. The results generated from fusing the three CNN channels are better than the ones generated using a single channel or two fused channels only. In fact, each channel learns features that cannot be seen in the other channels, which make combining them together produce better results. The experimental results of the proposed approach are compared with the state-of-the-art methods on three public datasets: 1) Microsoft Action 3-D dataset (MSRAction3D); 2) University of Texas at Dallas-multimodal human action dataset (UTD-MHAD); and 3) multimodal action dataset (MAD) dataset. The comparison outcomes proved that the action recognition accuracy is better than most of existing methods, and proved also that the recognition accuracy is stable even with a large number of actions, such as MAD dataset.

SYSTEM REQUIREMENTS
SOFTWARE REQUIREMENTS:
• Programming Language : Python
• Font End Technologies : TKInter/Web(HTML,CSS,JS)
• IDE : Jupyter/Spyder/VS Code
• Operating System : Windows 08/10

HARDWARE REQUIREMENTS:

 Processor : Core I3
 RAM Capacity : 2 GB
 Hard Disk : 250 GB
 Monitor : 15″ Color
 Mouse : 2 or 3 Button Mouse
 Key Board : Windows 08/10

For More Details of Project Document, PPT, Screenshots and Full Code
Call/WhatsApp – 9966645624
Email – info@srithub.com

Facebook
Twitter
WhatsApp
LinkedIn

Enquire Now

Leave your details here for more details.