Background: Neighbourhood environment characteristics have been found to be associated with residents’ willingness to conduct physical activity (PA). Traditional methods to assess perceived neighbourhood environment characteristics are often subjective, costly, and time-consuming, and can be applied only on a small scale. Recent developments in deep learning algorithms and the recent availability of street view images enable researchers to assess multiple aspects of neighbourhood environment perceptions more efciently on a large scale. This study aims to examine the relationship between each of six neighbourhood environment perceptual indicators—namely, wealthy, safe, lively, depressing, boring and beautiful—and residents’ time spent on PA in Guangzhou, China. Methods: A human–machine adversarial scoring system was developed to predict perceptions of neighbourhood environments based on Tencent Street View imagery and deep learning techniques. Image segmentation was conducted using a fully convolutional neural network (FCN-8s) and annotated ADE20k data. A human–machine adversarial scoring system was constructed based on a random forest model and image ratings by 30 volunteers. Multilevel linear regressions were used to examine the association between each of the six indicators and time spent on PA among 808 residents living in 35 neighbourhoods. Results: Total PA time was positively associated with the scores for “safe” [Coef.=1.495, SE=0.558], “lively” [1.635, 0.789] and “beautiful” [1.009, 0.404]. It was negatively associated with the scores for “depressing” [−1.232, 0.588] and “boring” [−1.227, 0.603]. No signifcant linkage was found between total PA time and the “wealthy” score. PA was further categorised into three intensity levels. More neighbourhood perceptual indicators were associated with higher intensity PA. The scores for “safe” and “depressing” were signifcantly related to all three intensity levels of PA. Conclusions: People living in perceived safe, lively and beautiful neighbourhoods were more likely to engage in PA, and people living in perceived boring and depressing neighbourhoods were less likely to engage in PA. Additionally, the relationship between neighbourhood perception and PA varies across diferent PA intensity levels. A combination of Tencent Street View imagery and deep learning techniques provides an accurate tool to automatically assess neighbourhood environment exposure for Chinese large cities.