上海大学学报(自然科学版) ›› 2023, Vol. 29 ›› Issue (1): 105-117.doi: 10.12066/j.issn.1007-2861.2365

• 研究论文 • 上一篇    下一篇

具有优先级的深度确定性策略梯度算法在自动驾驶中的应用

金彦亮(), 刘千红, 季泽宇   

  1. 上海大学 通信与信息工程学院, 上海 200444
  • 收稿日期:2020-11-27 出版日期:2023-02-28 发布日期:2023-03-28
  • 通讯作者: 金彦亮 E-mail:jinyanliang@staff.shu.edu.cn
  • 作者简介:金彦亮(1973—), 男, 副教授, 博士, 研究方向为无线传感器网络、无线宽带接入、人工智能等. E-mail: jinyanliang@staff.shu.edu.cn
  • 基金资助:
    上海市科委重点项目(19511102803)

Application of priority deep deterministic strategy algorithm in autonomous driving

JIN Yanliang(), LIU Qianhong, JI Zeyu   

  1. School of Communication and Information Engineering, Shanghai University, Shanghai 200444, China
  • Received:2020-11-27 Online:2023-02-28 Published:2023-03-28
  • Contact: JIN Yanliang E-mail:jinyanliang@staff.shu.edu.cn

摘要:

深度确定性策略梯度(deep deterministic policy gradient, DDPG)算法在自动驾驶领域中应用广泛, 但DDPG算法因采用均匀采样而导致低效率策略比例较高、训练效率低、收敛速度慢等. 提出了基于优先级的深度确定性策略梯度(priority-based DDPD, P-DDPG)算法, 通过优先级采样代替均匀采样来提升采样利用率、改善探索策略和提高神经网络训练效率, 并且提出新的奖励函数作为评价标准. 最后, 在开源赛车模拟(The Open Racing Car Simulator, TORCS) 平台上对P-DDPG算法的性能进行了测试, 结果表明 相对于DDPG算法, P-DDPG算法的累积奖励在25回合之后就有明显提升而DDPG在100回合之后训练效果才逐渐显现, 提升约4倍. P-DDPG 算法不仅训练效率提升且收敛速度加快.

关键词: 自动驾驶, DDPG算法, 优先级经验, TORCS

Abstract:

The deep deterministic policy gradient (DDPG) algorithm is widely used in autonomous driving; however, some problems, such as the high proportion of inefficient policies, low training efficiency, and slow convergence due to uniform sampling, still need to be addressed. In this paper, a priority-based deep deterministic policy gradient (P-DDPG) algorithm is proposed to enhance sampling utilization, improve exploration strategies, and increase the neural network training efficiency by using priority sampling instead of uniform sampling and employing a new reward function as an evaluation criterion. Finally, the performance of P-DDPG is evaluated on the The Open Racing Car Simulator (TORCS) platform. The results show that the cumulative reward of P-DDPG significantly improve after 25 rounds compared with that of the DDPG algorithm. Furthermore, the training effect of DDPG is gradually obtained after 100 rounds, which is approximately 4 times higher than that of P-DDPG. The training efficiency and convergence speed are, therefore, enhanced by using P-DDPG instead of DDPG.

Key words: autonomous driving, deep deterministic policy gradient (DDPG), priority experience, The Open Racing Car Simulator (TORCS)

中图分类号: