1. Dataset Introduction

中文版English version(current)

Data Annotation Process and Results

The research area of this dataset includes 81 major cities in China (Zhang et al., 2022), with a total area of 983,215 square kilometers. Due to the diverse spatial forms, patterns, and landscapes of these cities, the dataset provides a representative description of the remote sensing landscapes in Chinese urban areas.

During the preparation phase of the annotation task, to adapt to the block (community) scale, we integrated Alibaba's internal iTAG intelligent annotation platform, the Yida platform, and the DataStudio platform. We developed the AliLBS-CUG multi-source spatio-temporal data human-machine collaborative annotation platform (click here to view previous article details).

During the data annotation stage, we proposed a human-machine collaborative framework for dataset construction based on the Data Centric approach. In this framework, the data annotation process is jointly performed by human experts and machines, aiming to improve the quality of the dataset and the performance of models through iterations. The detailed content of this framework will be comprehensively described in our upcoming paper. Stay tuned for more information.

During the data validation stage, we employed a cross-validation method, using 25% of the data among different volunteers for validation, ensuring an accuracy range between 90% and 95%. In the end, we obtained a total of 116,121 data samples, including 44,588 samples for residential land, 8,184 samples for public services, 9,065 samples for commercial services, 27,529 samples for industrial land, and 26,755 samples for agricultural and natural land.

Collaborators

The collaborating partner for this research is Alibaba Group Holding Limited1, which has made significant contributions to our work. These contributions include, but are not limited to, the following:

  • Powerful cloud computing support: Alibaba Cloud2, as a cloud computing platform under Alibaba Group, provides robust computing capabilities for handling large-scale datasets and complex algorithms, offering high-performance computing support.
  • High-precision mapping data support: Amap3, a subsidiary of Alibaba Group, is one of the world's leading digital map service providers. They provide high-precision mapping data, offering accurate geographic locations and features for dataset annotation.
  • Data platform and cutting-edge algorithm support: Alibaba Group's data platform4 (now divided into departments like Aicheng Technology) is an integrated platform for data management, data applications, and data services. It integrates abundant data resources and data services. The collaborating LBS (Location-Based Services) team has extensive experience and technical expertise in artificial intelligence and machine learning, including image recognition, natural language processing, and other areas, providing strong support for the research work.

Furthermore, we would like to express our special gratitude to every member of the LBS team led by Qi Wei. Their expertise, patient guidance, and assistance have played a crucial role in our research work. We deeply appreciate their support!

To provide a better understanding of the dataset's characteristics and applicability, we have decided to make a portion of the test data available for public use. We have selected some samples from each category as the test dataset and named it CN-MSLU-DEMO-1K.

Description Document

Dataset Donwload

Click here to download CN-MSLU-DEMO-1K

Now CN-MSLU-DEMO-1K is fully open for use by teachers and students!

In addition, we are also about to release the CN-MSLU-DEMO-10K dataset, Click here to download CN-MSLU-DEMO-10K. The CN-MSLU-DEMO-10K dataset is currently under team internal testing. Any need of the dataset please apply to Dr. Yao Yao ( yaoy@cug.edu.cn ) for the password of the dataset compression package.

In addition, internal team members can also download CN-MSLU-DEMO-10K through NAS. (Click here to download)

We are open 100K dataset, it's avalible now (Click here to download)

Feedback and Communication

We welcome your suggestions and feedback on the research!

If you need the complete dataset for further research purposes, please contact the project leader, Professor Yao Yao (yaoy@cug.edu.cn).

For any other inquiries related to the project, you can reach out to the respective responsible team member.

2. DCAI Annotation Platform

The AliLBS-CUG Multi-source Spatio-temporal Data Human-Machine Collaborative Annotation Platform is a geographic semantic annotation platform developed by the UrbanComp team in collaboration with Alibaba's LBS team under the Alibaba Innovation Research (Air) project. This platform is based on the Data Centric AI (DCAI) concept and aims to assist in the dynamic visualization of multi-source spatio-temporal big data for nationwide and global-scale land use, urban functionality, socio-economic identification, and monitoring. It supports micro-scale human-machine collaborative annotation and rapid establishment of multi-scale sample sets. The platform integrates various high-performance storage and computing platforms within Alibaba, overcoming the barriers of inter-platform integration. Through multiple iterations, the platform has achieved user-friendly operations and fast response for human-machine collaborative intelligent annotation, meeting relevant requirements (click here to view previous article details).

The AliLBS-CUG annotation platform is just the beginning, and future research will further study the DCAI-based platform. Stay tuned for more updates!

3. Acknowledgment

We gathered 56 volunteer students from relevant majors such as Geographic Information at China University of Geosciences (Wuhan) to participate in the annotation work. We sincerely appreciate the hard work and dedication of each volunteer student!

The following is the list of volunteers, arranged horizontally in alphabetical order, without any particular ranking:

曾城泷戴良洋董安宁樊明
范云鹏冯羽彤高荣徽郭延铎
郭子豪郭紫锦韩佳澎韩葳奇
胡志辉胡子敬黄坤姜家政
江瑛李贵程李昊然李建锋
李锦鲜李思宇梁琳刘航甫
刘佳耀刘宇骁马跃恒裘嘉楠
冉耘博任斐然尚青欣汪玉笳
王斌王慧纹王芊卓王兆歆
尉锐武浩夏迎兵肖诗宇
熊凯路徐苏琪徐争薛晨阳
杨明斯喻承龙张凯楠张翔
赵传成赵业博周文海周宇航
朱坤坤朱恰祝翰林卓星语

4. References

The following references have provided a solid theoretical foundation for this research:

DCAI-CLUD Model:

  • Wu, H., Jiang, Z., Dong, A., Gao, R., Yan, X., Hu, Z., … Yao, Y. (2024). DCAI-CLUD: a data-centric framework for the construction of land-use datasets. International Journal of Geographical Information Science, 1–24.(Internal links

Methods for POI embedding:

  • Yao, Y., Zhu, Q., Guo, Z., Huang, W., Zhang, Y., Yan, X., ... & Guan, Q. (2023). Unsupervised land-use change detection using multi-temporal POI embedding. International Journal of Geographical Information Science, 1-24. (Internal links
  • Huang W, Cui L, Chen M, et al. Estimating urban functional distributions with semantics preserved POI embedding[J]. International Journal of Geographical Information Science, 2022, 36(10): 1905-1930. (Internal links
  • Yao Y, Li X, Liu X, et al. Sensing spatial distribution of urban land use by integrating points-of-interest and Google Word2Vec model[J]. International Journal of Geographical Information Science, 2017, 31(4): 825-848.(Internal links

Methods for multi-source data fusion:

  • Yao Y, Zhang J, Qian C, et al. Delineating urban job-housing patterns at a parcel scale with street view imagery[J]. International Journal of Geographical Information Science, 2021, 35(10): 1927-1950.(Internal links
  • Yao Y, Yan X, Luo P, et al. Classifying land-use patterns by integrating time-series electricity data and high-spatial resolution remote sensing imagery[J]. International Journal of Applied Earth Observation and Geoinformation, 2022, 106: 102664.(Internal links
  • He J, Zhang J, Yao Y, et al. Extracting human perceptions from street view images for better assessing urban renewal potential[J]. Cities, 2023, 134: 104189.(Internal links
  • Guan Q, Cheng S, Pan Y, et al. Sensing mixed urban land-use patterns using municipal water consumption time series[J]. Annals of the American Association of Geographers, 2021, 111(1): 68-86.(Annals of the American Association of Geographers

Fusion methods for trajectory embedding:

  • Zhang J, Li X, Yao Y, et al. The Traj2Vec model to quantify residents’ spatial trajectories and estimate the proportions of urban land-use types[J]. International Journal of Geographical Information Science, 2021, 35(1): 193-211.(International Journal of Geographical Information Science

Human-machine adversarial (collaborative) mode:

  • Yao Y, Liang Z, Yuan Z, et al. A human-machine adversarial scoring framework for urban perception assessment using street-view images[J]. International Journal of Geographical Information Science, 2019, 33(12): 2363-2384.(Internal links)

Recommendation systems:

  • Yao Y, Liu P, Hong Y, et al. Fine‐scale intra‐and inter‐city commercial store site recommendations using knowledge transfer[J]. Transactions in GIS, 2019, 23(5): 1029-1047.(Transactions in GIS)

Other related studies:

  • Liu X, He J, Yao Y, et al. Classifying urban land use by integrating remote sensing and social media data[J]. International Journal of Geographical Information Science, 2017, 31(8): 1675-1696.(Internal links)
  • Yao Y, Liu X, Li X, et al. Mapping fine-scale population distributions at the building level by integrating multisource geospatial big data[J]. International Journal of Geographical Information Science, 2017, 31(6): 1220-1244.(Internal links)
  • 姚尧, 任书良, 王君毅, 等. 卷积神经网络和随机森林的城市房价微观尺度制图方法[J]. 地球信息科学学报, 2019, 21(2): 168-177.(Internal links)