具有优先级的深度确定性策略梯度算法在自动驾驶中的应用

doi:10.12066/j.issn.1007-2861.2365

上海大学学报(自然科学版) ›› 2023, Vol. 29 ›› Issue (1): 105-117.doi: 10.12066/j.issn.1007-2861.2365

具有优先级的深度确定性策略梯度算法在自动驾驶中的应用

金彦亮(), 刘千红, 季泽宇

上海大学通信与信息工程学院, 上海 200444

收稿日期:2020-11-27 出版日期:2023-02-28 发布日期:2023-03-28
通讯作者: 金彦亮 E-mail:jinyanliang@staff.shu.edu.cn
作者简介:金彦亮(1973—), 男, 副教授, 博士, 研究方向为无线传感器网络、无线宽带接入、人工智能等. E-mail: jinyanliang@staff.shu.edu.cn
基金资助:
上海市科委重点项目(19511102803)

Application of priority deep deterministic strategy algorithm in autonomous driving

JIN Yanliang(), LIU Qianhong, JI Zeyu

School of Communication and Information Engineering, Shanghai University, Shanghai 200444, China

Received:2020-11-27 Online:2023-02-28 Published:2023-03-28
Contact: JIN Yanliang E-mail:jinyanliang@staff.shu.edu.cn

摘要/Abstract

摘要：

深度确定性策略梯度(deep deterministic policy gradient, DDPG)算法在自动驾驶领域中应用广泛, 但DDPG算法因采用均匀采样而导致低效率策略比例较高、训练效率低、收敛速度慢等. 提出了基于优先级的深度确定性策略梯度(priority-based DDPD, P-DDPG)算法, 通过优先级采样代替均匀采样来提升采样利用率、改善探索策略和提高神经网络训练效率, 并且提出新的奖励函数作为评价标准. 最后, 在开源赛车模拟(The Open Racing Car Simulator, TORCS) 平台上对P-DDPG算法的性能进行了测试, 结果表明相对于DDPG算法, P-DDPG算法的累积奖励在25回合之后就有明显提升而DDPG在100回合之后训练效果才逐渐显现, 提升约4倍. P-DDPG 算法不仅训练效率提升且收敛速度加快.

关键词: 自动驾驶, DDPG算法, 优先级经验, TORCS

Abstract:

The deep deterministic policy gradient (DDPG) algorithm is widely used in autonomous driving; however, some problems, such as the high proportion of inefficient policies, low training efficiency, and slow convergence due to uniform sampling, still need to be addressed. In this paper, a priority-based deep deterministic policy gradient (P-DDPG) algorithm is proposed to enhance sampling utilization, improve exploration strategies, and increase the neural network training efficiency by using priority sampling instead of uniform sampling and employing a new reward function as an evaluation criterion. Finally, the performance of P-DDPG is evaluated on the The Open Racing Car Simulator (TORCS) platform. The results show that the cumulative reward of P-DDPG significantly improve after 25 rounds compared with that of the DDPG algorithm. Furthermore, the training effect of DDPG is gradually obtained after 100 rounds, which is approximately 4 times higher than that of P-DDPG. The training efficiency and convergence speed are, therefore, enhanced by using P-DDPG instead of DDPG.

Key words: autonomous driving, deep deterministic policy gradient (DDPG), priority experience, The Open Racing Car Simulator (TORCS)

中图分类号:

TP242.6

金彦亮, 刘千红, 季泽宇. 具有优先级的深度确定性策略梯度算法在自动驾驶中的应用[J]. 上海大学学报(自然科学版), 2023, 29(1): 105-117.

JIN Yanliang, LIU Qianhong, JI Zeyu. Application of priority deep deterministic strategy algorithm in autonomous driving[J]. Journal of Shanghai University（Natural Science Edition）, 2023, 29(1): 105-117.

图/表 15

图1

图2

图3

表1

图4

表2

图5

表3

图6

图7

图8

图9

图10

表4

表5

参考文献 19

[1]	王奕博. 自动驾驶及其关键技术的研究[J]. 通讯世界, 2019, 26(10): 279-280.
[2]	Mnih V, Kavukcuoglu K, Silver D, et al. Playing Atari with deep reinforcement learning[J]. Computer Science, 2013, 68(440): 22-33.
[3]	Mnih V, Kavukcuoglu K, Silver D, et al. Human-level control through deep reinforcement learning[J]. Nature, 2015, 518(7540): 529-533.
[4]	何佳, 戎辉, 王文扬, 等. 百度谷歌无人驾驶汽车发展综述[J]. 汽车电器, 2017(12): 19-21.
[5]	Bojarski M, Testa D D, Dworakowski D, et al. End to end learning for self-driving cars[EB/OL]. [2022-10-03]. https://blog.csdn.net/kuvinxu/article/details/114288007.
[6]	Karavolos D. Q-learning with heuristic exploration in Simulated Car Racing[M]. Cambridge, MA: MIT Press, 2013.
[7]	Jaritz M, Charette R D, Toromanoff M, et al. End-to-end race driving with deep reinforcement learning[C]// International Conference on Robotics and Automation. 2018: 2070-2075.
[8]	Lau Y P. Using keras and deep deterministic policy gradient to play torcs[EB/OL]. [2022-10-12] https://yanpanlau.github.io/2016/10/11/Torcs.
[9]	Kaushik M, Prasad V, Krishna K M, et al. Overtaking maneuvers in simulated highway driving using deep reinforcement learning[C]// Intelligent Vehicles Symposium. 2018: 1885-1890.
[10]	张斌, 何明, 陈希亮, 等. 改进DDPG算法在自动驾驶中的应用[J]. 计算机工程与应用, 2019, 55(10): 1-10.
[11]	Lange S, Riedmiller M, Voigtlander A. Autonomous reinforcement learning on raw visual input data in a real world application[EB/OL]. [2022-11-04]. https://xueshu.baidu.com/usercenter/paper/show?paperid=9ab0c6bb6f20e9e261c075db5faac598&site=xueshu_se.
[12]	Huval B, Wang T, Tandon S, et al. An empirical evaluation of deep learning on highway driving[EB/OL]. [2022-09-25]. https://xueshu.baidu.com/usercenter/paper/show?paperid=ec871a6ea63bb711dcebc6d9830ddbb4&site=xueshu_se.
[13]	Huang W H, Braghin F, Wang Z, et al. Learning to drive via apprenticeship learning and deep reinforcement learning[C]// International Conference on Tools with Artificial Intelligence. 2019: 1536-1540.
[14]	Chae H, Kang C M, Kim B D, et al. Autonomous braking system via deep reinforcement learning[EB/OL]. [2022-09-16].https://arxiv.org/abs/1702.02302.
[15]	Mnih V, Badia A P, Mirza M, et al. Asynchronous methods for deep reinforcement learning[EB/OL]. [2022-11-22].https://blog.csdn.net/jiayoudangdang/article/details/113936242.
[16]	Kendall A, Hawke J, Janz D, et al. Learning to drive in a day[C]// International Conference on Robotics and Automation. 2019: 8248-8254.
[17]	Lillicrap T P, Hunt J J, Pritzel A, et al. Continuous control with deep reinforcement learning[EB/OL]. [2022-10-18].https://www.cnblogs.com/lucifer1997/p/13890666.html.
[18]	Sutton R S, Barto A G. Reinforcement learning: an introduction[M]. Cambridge: MIT Press, 2018: 27-161.
[19]	Schaul T, Quan J, Antonoglou I, et al. Prioritized experience replay[EB/OL]. [2022-11-08].https://wenku.baidu.com/view/d09977b811661ed9ad51f01dc281e53a59025114.html?_wkts_=1669776885058&bdQuery=Prioritized+experience+replay.

具有优先级的深度确定性策略梯度算法在自动驾驶中的应用

Application of priority deep deterministic strategy algorithm in autonomous driving

RichHTML

PDF

可视化

摘要/Abstract

引用本文

使用本文

图/表 15

参考文献 19

相关文章 1

编辑推荐

Metrics

本文评价