While recent advancements in image inpainting techniques have shown significant improvements, it often faces challenges in producing realistic image structures, particularly when it comes to filling large holes within intricate images. Moreover, considering computational efficiency, it is common practice to train the network using low-resolution images. To address the restoration challenge of high-resolution images with significant missing regions, a new method called Multi-Conv-Transformer is proposed in this paper. We integrate the advantages of Transformer and CNNs to enable the model to exhibit efficient performance in training high-resolution images. A customized transformer block specifically optimized for inpainting purposes is introduced in our work. Within this block, the proposed multi-head self-attention module collects non-local information only from valid tokens identified by a dynamic mask., thus prioritizing the partial regions of the image. The experiment results demonstrate that our model performs well in the restoration of various scenarios.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.