Paper
22 March 2019 Classification of speaking activity based on lip features in a sequence of video frames
Author Affiliations +
Proceedings Volume 11049, International Workshop on Advanced Image Technology (IWAIT) 2019; 110491K (2019) https://doi.org/10.1117/12.2521574
Event: 2019 Joint International Workshop on Advanced Image Technology (IWAIT) and International Forum on Medical Imaging in Asia (IFMIA), 2019, Singapore, Singapore
Abstract
In human activity classification, detecting speaking activity can be applied further in behavior analysis such as student learning behavior in an active learning environment. This paper presents a method for classifying whether or not a person is speaking based on lib movement in a video sequence. Assuming that a person of interest is tracked within a room using multiple cameras, at least one camera can capture the face of a target person at every instant of time. Using this sequence of frames of a target person, this paper proposes a method for continuously deciding whether the person is speaking. Firstly, head part is segmented based on (1) the head's top position, (2) head's width and golden ratio of head's height and width. Secondly, the face area is extracted using a skin detection technique. Thirdly, the mouth area in each frame is segmented based on its geometry on a face and a mouth has different color from face skin. Next, mouth opening is roughly detected based on the fact that the opening area has a darker gray level than its average. Finally, only frequency components between 1 Hz to 10 Hz of the detected feature signal is extracted and used to classify the speaking activity by comparing with a threshold. The proposed method is tested with 3 sets of videos. The results showed that the speaking classification and mouth detection achieved 93 % and 94 % accuracy, respectively.
© (2019) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Prin Bandisak, Watcharapan Suwansantisuk, and Pinit Kumhom "Classification of speaking activity based on lip features in a sequence of video frames", Proc. SPIE 11049, International Workshop on Advanced Image Technology (IWAIT) 2019, 110491K (22 March 2019); https://doi.org/10.1117/12.2521574
Lens.org Logo
CITATIONS
Cited by 1 scholarly publication.
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Mouth

Signal detection

Laser induced plasma spectroscopy

Video

Head

Skin

Cameras

RELATED CONTENT

Detecting lip motion in digital video
Proceedings of SPIE (January 22 1999)
Motion based situation recognition in group meetings
Proceedings of SPIE (January 28 2010)
Detection and tracking of facial features based on stereo video
Proceedings of SPIE (September 21 2001)

Back to Top