Earthquakes, and their cascading threats to economic and social sustainability, are a common problem between China and Chile. In such emergencies, automatic image recognition systems have become critical tools for preventing and reducing civilian casualties. Human crowd detection and estimation are fundamental for automatic recognition under life-threatening natural disasters. However, detecting and estimating crowds in scenes is nontrivial due to occlusion, complex behaviors, posture changes, and camera angles, among other issues. This paper presents the first steps in developing an intelligent Earthquake Early Warning System (EEWS) between China and Chile. The EEWS exploits the ability of deep learning architectures to properly model different spatial scales of people and the varying degrees of crowd densities. We propose an autoencoder architecture for crowd detection and estimation because it creates compressed representations for the original crowd input images in its latent space. The proposed architecture considers two cascaded autoencoders. The first performs reconstructive masking of the input images, while the second generates Focal Inverse Distance Transform (FIDT) maps. Thus, the cascaded autoencoders improve the ability of the network to locate people and crowds, thereby generating high-quality crowd maps and more reliable count estimates.
|