Cross-view geolocation, which aims to geolocate ground-view images using reference satellite imagery, is a challenging task that requires effective strategies to minimize significant disparities between images. In this paper, we propose a novel approach to address this challenge. By exploiting the projection relationship between ground images and satellite images, we are able to convert ground-view images into satellite-view images, thereby mitigating the inherent disparities between the two perspectives. To enhance the converted images further, we employ a conditional generative adversarial network (CGAN). This network generates satellite perspective images that exhibit greater consistency with the actual satellite data, thereby reducing excessive disparities and improving overall image quality. Additionally, we adopt a joint training approach in a multi-task setting, wherein we synthesize satellite images from ground images and conduct cross-view image matching. This framework facilitates mutual learning between the tasks and leads to improved performance. Empirical evaluation of our proposed method showcases significant advancements in terms of retrieval accuracy and synthesis quality when compared to existing techniques. These findings underscore the potential of our approach in addressing the challenges associated with cross-view geolocation.
|