Scene classification is a hot research topic in the geoscience and remote sensing (RS) community. Currently, the investigations conducted in RS domain mainly use single source data (e.g. multispectral imagery (MSI), hyperspectral imagery (HSI), or light detection and ranging (LiDAR), etc.). However, one of the RS data aforementioned merely provides one certain perspective of the complex scenes while the multisource data fusion can provide complementary and robust knowledge about the objects of interest. We aim at fusing the spectral– spatial information of the HSI and the spatial-elevation information of LiDAR data for scene classification. In this work, the densely connected convolutional neural network (DensNet), which connects all preceding layers to later layers in feed-forword manner, is employed to effectively extract and reuse heterogeneous features from HSI and LiDAR data. More specifically, a novel two-stream DensNet architecture is proposed, which builds an identical but separated DensNet stream for each data respectively. Then one of stream is utilized to extract the spectral-spatial features from HSI, the other is exploited to extract the spatial-elevation features of LiDAR data. Subsequently, the spectral–spatial–elevation features extracted in two streams are deeply fused within the fusion network which consists of two fully-connected layers for the final classification. Experimental results conducted on widely-used benchmark datasets show that the proposed architecture provides competitive performance in comparison with the state-of-the-art methods.