Most of recent work on action recognition in video employ action parts, attributes etc. as mid- and high-level features to
represent an action. However, these action parts, attributes subject to some aspects of weak discrimination and being
difficult to obtain. In this paper, we present an approach that uses mid-level discriminative Spatial-Temporal Volume to
recognize human actions. The spatial-temporal volume is represented by a Feature Graph which is constructed beyond
on a local collection of feature points (e.g., cuboids, STIP) located in the corresponding spatial-temporal volume. Firstly,
we densely sampling spatial-temporal volumes from training videos and construct a feature graph for each volume. Then,
all feature graphs are clustered using spectral cluster method. We regard feature graphs as video words and characterize
videos with the bag-of-features framework which we call it the bag-of-feature-graphs framework. While, in the process
of clustering, the distance between two feature graphs is computed using an efficient spectral method. Final recognition
is accomplished using a linear-SVM classifier. We test our algorithm in a publicly available human action dataset, the
experimental results show the effectiveness of our method.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.