Transformer-DRL for megacities' real-time multi-depot logistics optimization

Title: Real-time multi-depot urban logistics optimization in megacities via transformer-based deep reinforcement learning

fig1_Megacities' real-time multi-depot logistics optimization (Transformer-DRL)

Abstract

Rising customer demands and the complexities of dynamic urban systems pose significant challenges for logistics distribution, especially since large-scale real-time dynamic traffic information is not always accessible. However, few studies have focused on optimizing logistics in the ever-changing traffic environments of megacities with multiple distribution centers. This study proposes two deep reinforcement learning models with Transformer architectures to optimize logistics distribution time costs across multiple depots in static and dynamic traffic scenarios, respectively. The first model (DTM-MDVRP) incorporates travel times between customers as edge information in the encoder to pre-plan delivery routes. The second model (DTM-DMDVRP) introduces a feature embedding module to extract real-time traffic information for dynamic route optimization. Wuhan city was selected for logistics optimization experiments. Results indicate that DTM-MDVRP surpasses heuristic methods and other deep reinforcement learning methods in optimization effectiveness and computation time. In dynamic urban traffic environments, DTM-DMDVRP further improves distribution efficiency. Compared to the traditional attention model, DTM-DMDVRP reduces time costs by 7.77, 3.51, and 3.58% across three problem scales and can optimize delivery routes for 100 customer points within 0.30 seconds. The proposed DTM-DMDVRP enables the real-time dynamic scheduling of logistics vehicles for logistics enterprises.

Keywords

Multi-depot vehicle routing problem;
dynamic logistics optimization;
deep reinforcement learning;
multi-head attention mechanism;
complex road network