We describe in this paper how we mixed 3D information, i.e., a DSM (Digital Surface Model), for segmentation tasks in airport environments. The segmentation output classes were set to asphalt, concrete, and building classes because these are informative for distinguishing airport functionality. DSM is very informative for extracting buildings because airports are usually located on flat fields; however, high resolution DSMs are not provided for free. Therefore, we gathered adequate numbers of very-high-resolution satellite images and generated DSMs through stereo processing by ourselves. At the same time, we trained a modified U-NET for the initial segmentation. By leveraging the results of the segmentation, we identified ground pixels, i.e., asphalt or concrete, and calculated the ground height. Then, we applied an adaptive threshold algorithm to the DSMs by using the ground height and extracted building masks. Finally, we concatenated probability maps from the modified U-NET and building masks that represented the building class with a high precision in the flat airport fields. Consequently, we obtained better performance than the initial segmentation results, especially in the case of the building class. In experiments, we confirmed that the modified U-NET could detect asphalt and concrete with a high precision and that it was possible to identify ground pixels and extract building masks. The performance of our approach was improved by 20%, especially in detecting the building class. For future work, we will improve the quality of stereo processing and combine size specific detectors to achieve more accurate detection.