Title: MSLU-100K: A Large Multi-Source Dataset for Land Use Analysis in Major Chinese Cities
Abstract
High-quality land use datasets are essential for advancing research in land use classification and recognition. However, the complexity and spatial heterogeneity of land use create challenges in dataset construction. To address these issues, we present MSLU-100K, a multi-source land use dataset encompassing over 100,000 irregular parcel samples from 81 Chinese cities. Constructed using a human-computer collaboration framework, this dataset integrates remote sensing and POI (Point of Interest) data, categorizing parcels into 7 primary and 28 secondary land use types. A novel multi-level classification approach combines manual labeling and deep learning, ensuring high data quality across six quality levels. Over 57% of the dataset comprises high-quality samples (Levels 4 and 5), which significantly enhance classification performance. The dataset provides a robust resource for land use recognition, urban planning, and spatial research.
Data and Code Records
Data Download
The dataset comprises two folders, a Python program, and a CSV file. The Classification folder contains metadata files in XML format for samples, including information such as sample category, path, and image size. The ImageSets folder houses remote sensing images categorized by land use types, divided into Agr, Res, Com, Pub, and Ind. The DatasetGenerate .py file provides sample code for generating dataset tables from XML files. Executing this script results in the creation of MSLU-100K.csv, the dataset table. This table includes details on category, file name, storage path, image width, image height, geographic information, primary category name, and secondary category name for all entries.
The dataset40 is publicly available for free on Open Science Framework (https://doi.org/10.17605/OSF.IO/YAENR)
Code Download
The software used to create the dataset were an intelligent data annotation platform developed by Alibaba (https://imark.taobao.com) and Python 3.9.
The rest of the code and sample data used to reproduce our work are publicly available at https://doi.org/10.6084/m9.figshare.27852591.
Full Text Download
Additional Dataset Description
(Chinese Version) CN-MSLU-100K:可支持多源时空大数据的地块(社区)尺度全国土地利用类别数据集
Q.E.D.