High spatial resolution (HSR) imagery scene classification, which involves labeling an HSR image with a specific semantic class according to the geographical properties, has received increased attention, and many algorithms have been proposed for this task. The employment of the probabilistic topic model to acquire latent topics and the convolutional neural networks (CNNs) to capture deep features for representing HSR images has been an effective ways to bridge the semantic gap. However, the midlevel topic features are usually local and significant, whereas the high-level deep features convey more global and detailed information. In this paper, to discover more discriminative semantics for HSR images, the adaptive deep sparse semantic modeling (ADSSM) framework combining sparse topics and deep features is proposed for HSR image scene classification. In ADSSM, the fully sparse topic model and a CNN are integrated. To exploit the multilevel semantics for HSR scenes, the sparse topic features and deep features are effectively fused at the semantic level. Based on the difference between the sparse topic features and the deep features, an adaptive feature normalization strategy is proposed to improve the fusion of the different features. The experimental results obtained with four HSR image classification data sets confirm that the proposed method significantly improves the performance when compared with the other state-of-the-art methods.