Traditional methods to analyze video comprising human action are based on the extraction of the spatial content from image frames and the temporal variation across image frames of the video. Typically, images contain redundant spatial content and temporal invariance. For instance, background data and stationary objects represent redundant spatial content as well as temporal invariance. The redundancy leads to increase in storage requirements and the computation time. This paper focuses on the analysis on the key point data obtained from the capture of body movement, hand gestures, and facial expression in video-based sign language recognition. The key point data is obtained from OpenPose. OpenPose provides two-dimensional estimates of the human pose for multiple persons in real-time. In this paper, the K-means cluster method is applied to the key point data. The K-means cluster method selects the key frames based on the number of centroids formed from the key point data. The method described in this paper generates the data required for deep learning applications.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.