1. Dataset Introduction
中文版,English version(current)
Data Annotation Process and Results
The research area of this dataset includes 81 major cities in China (Zhang et al., 2022), with a total area of 983,215 square kilometers. Due to the diverse spatial forms, patterns, and landscapes of these cities, the dataset provides a representative description of the remote sensing landscapes in Chinese urban areas.
During the preparation phase of the annotation task, to adapt to the block (community) scale, we integrated Alibaba's internal iTAG intelligent annotation platform, the Yida platform, and the DataStudio platform. We developed the AliLBS-CUG multi-source spatio-temporal data human-machine collaborative annotation platform (click here to view previous article details).
During the data annotation stage, we proposed a human-machine collaborative framework for dataset construction based on the Data Centric approach. In this framework, the data annotation process is jointly performed by human experts and machines, aiming to improve the quality of the dataset and the performance of models through iterations. The detailed content of this framework will be comprehensively described in our upcoming paper. Stay tuned for more information.
During the data validation stage, we employed a cross-validation method, using 25% of the data among different volunteers for validation, ensuring an accuracy range between 90% and 95%. In the end, we obtained a total of 116,121 data samples, including 44,588 samples for residential land, 8,184 samples for public services, 9,065 samples for commercial services, 27,529 samples for industrial land, and 26,755 samples for agricultural and natural land.
Collaborators
The collaborating partner for this research is Alibaba Group Holding Limited1, which has made significant contributions to our work. These contributions include, but are not limited to, the following:
- Powerful cloud computing support: Alibaba Cloud2, as a cloud computing platform under Alibaba Group, provides robust computing capabilities for handling large-scale datasets and complex algorithms, offering high-performance computing support.
- High-precision mapping data support: Amap3, a subsidiary of Alibaba Group, is one of the world's leading digital map service providers. They provide high-precision mapping data, offering accurate geographic locations and features for dataset annotation.
- Data platform and cutting-edge algorithm support: Alibaba Group's data platform4 (now divided into departments like Aicheng Technology) is an integrated platform for data management, data applications, and data services. It integrates abundant data resources and data services. The collaborating LBS (Location-Based Services) team has extensive experience and technical expertise in artificial intelligence and machine learning, including image recognition, natural language processing, and other areas, providing strong support for the research work.
Furthermore, we would like to express our special gratitude to every member of the LBS team led by Qi Wei. Their expertise, patient guidance, and assistance have played a crucial role in our research work. We deeply appreciate their support!
Test Set Description and Download Link
To provide a better understanding of the dataset's characteristics and applicability, we have decided to make a portion of the test data available for public use. We have selected some samples from each category as the test dataset and named it CN-MSLU-DEMO-1K.
Description Document
Dataset Donwload
Click here to download CN-MSLU-DEMO-1K
Now CN-MSLU-DEMO-1K is fully open for use by teachers and students!
In addition, we are also about to release the CN-MSLU-DEMO-10K dataset, Click here to download CN-MSLU-DEMO-10K. The CN-MSLU-DEMO-10K dataset is currently under team internal testing. Any need of the dataset please apply to Dr. Yao Yao ( yaoy@cug.edu.cn ) for the password of the dataset compression package.
In addition, internal team members can also download CN-MSLU-DEMO-10K through NAS. (Click here to download)
We are open 100K dataset, it's avalible now (Click here to download)
Feedback and Communication
We welcome your suggestions and feedback on the research!
If you need the complete dataset for further research purposes, please contact the project leader, Professor Yao Yao (yaoy@cug.edu.cn).
For any other inquiries related to the project, you can reach out to the respective responsible team member.
- Yan Xiaoqin( xxxiaoqin@cug.edu.cn )
- WuHao( wuh@cug.edu.cn )
- Dong Anning( donganning@cug.edu.cn )
- Zhua Qia( qiazhu@cug.edu.cn )
- Gao Ronghui( ronghui.gao@cug.edu.cn )
- Hu Zhihui( zhihui.hu@cug.edu.cn )
2. DCAI Annotation Platform
The AliLBS-CUG Multi-source Spatio-temporal Data Human-Machine Collaborative Annotation Platform is a geographic semantic annotation platform developed by the UrbanComp team in collaboration with Alibaba's LBS team under the Alibaba Innovation Research (Air) project. This platform is based on the Data Centric AI (DCAI) concept and aims to assist in the dynamic visualization of multi-source spatio-temporal big data for nationwide and global-scale land use, urban functionality, socio-economic identification, and monitoring. It supports micro-scale human-machine collaborative annotation and rapid establishment of multi-scale sample sets. The platform integrates various high-performance storage and computing platforms within Alibaba, overcoming the barriers of inter-platform integration. Through multiple iterations, the platform has achieved user-friendly operations and fast response for human-machine collaborative intelligent annotation, meeting relevant requirements (click here to view previous article details).
The AliLBS-CUG annotation platform is just the beginning, and future research will further study the DCAI-based platform. Stay tuned for more updates!
3. Acknowledgment
We gathered 56 volunteer students from relevant majors such as Geographic Information at China University of Geosciences (Wuhan) to participate in the annotation work. We sincerely appreciate the hard work and dedication of each volunteer student!
The following is the list of volunteers, arranged horizontally in alphabetical order, without any particular ranking:
曾城泷 | 戴良洋 | 董安宁 | 樊明 |
范云鹏 | 冯羽彤 | 高荣徽 | 郭延铎 |
郭子豪 | 郭紫锦 | 韩佳澎 | 韩葳奇 |
胡志辉 | 胡子敬 | 黄坤 | 姜家政 |
江瑛 | 李贵程 | 李昊然 | 李建锋 |
李锦鲜 | 李思宇 | 梁琳 | 刘航甫 |
刘佳耀 | 刘宇骁 | 马跃恒 | 裘嘉楠 |
冉耘博 | 任斐然 | 尚青欣 | 汪玉笳 |
王斌 | 王慧纹 | 王芊卓 | 王兆歆 |
尉锐 | 武浩 | 夏迎兵 | 肖诗宇 |
熊凯路 | 徐苏琪 | 徐争 | 薛晨阳 |
杨明斯 | 喻承龙 | 张凯楠 | 张翔 |
赵传成 | 赵业博 | 周文海 | 周宇航 |
朱坤坤 | 朱恰 | 祝翰林 | 卓星语 |
4. References
The following references have provided a solid theoretical foundation for this research:
DCAI-CLUD Model:
- Wu, H., Jiang, Z., Dong, A., Gao, R., Yan, X., Hu, Z., … Yao, Y. (2024). DCAI-CLUD: a data-centric framework for the construction of land-use datasets. International Journal of Geographical Information Science, 1–24.(Internal links)
Methods for POI embedding:
- Yao, Y., Zhu, Q., Guo, Z., Huang, W., Zhang, Y., Yan, X., ... & Guan, Q. (2023). Unsupervised land-use change detection using multi-temporal POI embedding. International Journal of Geographical Information Science, 1-24. (Internal links)
- Huang W, Cui L, Chen M, et al. Estimating urban functional distributions with semantics preserved POI embedding[J]. International Journal of Geographical Information Science, 2022, 36(10): 1905-1930. (Internal links)
- Yao Y, Li X, Liu X, et al. Sensing spatial distribution of urban land use by integrating points-of-interest and Google Word2Vec model[J]. International Journal of Geographical Information Science, 2017, 31(4): 825-848.(Internal links)
Methods for multi-source data fusion:
- Yao Y, Zhang J, Qian C, et al. Delineating urban job-housing patterns at a parcel scale with street view imagery[J]. International Journal of Geographical Information Science, 2021, 35(10): 1927-1950.(Internal links)
- Yao Y, Yan X, Luo P, et al. Classifying land-use patterns by integrating time-series electricity data and high-spatial resolution remote sensing imagery[J]. International Journal of Applied Earth Observation and Geoinformation, 2022, 106: 102664.(Internal links)
- He J, Zhang J, Yao Y, et al. Extracting human perceptions from street view images for better assessing urban renewal potential[J]. Cities, 2023, 134: 104189.(Internal links)
- Guan Q, Cheng S, Pan Y, et al. Sensing mixed urban land-use patterns using municipal water consumption time series[J]. Annals of the American Association of Geographers, 2021, 111(1): 68-86.(Annals of the American Association of Geographers)
Fusion methods for trajectory embedding:
- Zhang J, Li X, Yao Y, et al. The Traj2Vec model to quantify residents’ spatial trajectories and estimate the proportions of urban land-use types[J]. International Journal of Geographical Information Science, 2021, 35(1): 193-211.(International Journal of Geographical Information Science)
Human-machine adversarial (collaborative) mode:
- Yao Y, Liang Z, Yuan Z, et al. A human-machine adversarial scoring framework for urban perception assessment using street-view images[J]. International Journal of Geographical Information Science, 2019, 33(12): 2363-2384.(Internal links)
Recommendation systems:
- Yao Y, Liu P, Hong Y, et al. Fine‐scale intra‐and inter‐city commercial store site recommendations using knowledge transfer[J]. Transactions in GIS, 2019, 23(5): 1029-1047.(Transactions in GIS)
Other related studies:
- Liu X, He J, Yao Y, et al. Classifying urban land use by integrating remote sensing and social media data[J]. International Journal of Geographical Information Science, 2017, 31(8): 1675-1696.(Internal links)
- Yao Y, Liu X, Li X, et al. Mapping fine-scale population distributions at the building level by integrating multisource geospatial big data[J]. International Journal of Geographical Information Science, 2017, 31(6): 1220-1244.(Internal links)
- 姚尧, 任书良, 王君毅, 等. 卷积神经网络和随机森林的城市房价微观尺度制图方法[J]. 地球信息科学学报, 2019, 21(2): 168-177.(Internal links)