Scene classification has been studied to allow us to semantically interpret high spatial resolution (HSR) remote sensing imagery. The bag-of-visual-words (BOVW) model is an effective method for HSR image scene classification. However, the traditional BOVW model only captures the local patterns of images by utilizing local features. In this letter, a local-global feature bag-of-visual-words scene classifier (LGFBOVW) is proposed for HSR imagery. In LGFBOVW, the shape-based invariant texture index is designed as the global texture feature, the mean and standard deviation values are employed as the local spectral feature, and the dense scale-invariant feature transform (SIFT) feature is employed as the structural feature. The LGFBOVW can effectively combine the local and global features by an appropriate feature fusion strategy at histogram level. Experimental results on UC Merced and Google data sets of SIRI-WHU demonstrate that the proposed method outperforms the state-of-the-art scene classification methods for HSR imagery.