Currently, object detection based on deep learning has received extensive research and attention in the field of grid inspection, achieving high detection accuracy and recognition precision. However, pre-trained object detection models lack overall perception and reasoning capabilities, resulting in higher false positives and missings due to a lack of holistic understanding of challenging samples. Recently, the combination of natural language models and image understanding in multi-modal large language models has gained significant attention. In this paper, we propose the Grid-Blip model, a multi-modal large model enhanced with general knowledge, to specifically study wildfires detection in grid inspection. Grid-Blip is based on the blip model architecture, which includes a natural language model, a visual generation model, and a fusion model. We conduct large-scale sample annotation at the semantic level of whole-image grid inspection, providing crucial training samples for multi-modal large-scale model research. Furthermore, we investigate the design of the fusion model network, training the model to effectively integrate the pre-trained natural language model and visual generation model. Experimental results demonstrate that compared to object detection models, the proposed multi-modal large-scale model in this paper achieves overall semantic perception and reasoning capabilities. The Grid-Blip model reduces the false alarm rate for wildfire smoke trend prediction from 20% to 10% and the missed detection rate from 18% to 13%.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.