1 |
BELLMAN R A problem in the sequential design of experiments. The Indian Journal of Statistics, 1956, 16 (34): 221- 229.
|
2 |
SILVER D, HUANG A, MADDISON C J, et al Mastering the game of Go with deep neural networks and tree search. Nature, 2016, 529 (7587): 484- 489.
doi: 10.1038/nature16961
|
3 |
LIN C J, JHANG J Y, LEE C L, et al Using a reinforcement Q-learning-based deep neural network for playing video games. Electronics, 2019, 8 (10): 1128.
doi: 10.3390/electronics8101128
|
4 |
TAMASSIA M, ZAMBETTA F, RAFFE W L, et al Learning options from demonstrations: a pac-man case study. IEEE Trans. on Computational Intelligence and AI in Games, 2018, 10 (1): 91- 96.
|
5 |
WYDMUCH M, KEMPKA M, JASKOWSKI W. ViZDoom competitions: playing doom from pixels. IEEE Trans. on Computational Intelligence and AI in Games, 2019, 11(3): 248–259.
|
6 |
JADERBERG M, CZARNECKI W M, DUNNING I, et al Human-level performance in 3D multiplayer games with population-based reinforcement learning. Science, 2019, 364 (6443): 859- 865.
doi: 10.1126/science.aau6249
|
7 |
LIANG L, CHEN Y C, LIAO L C, et al A novel impedance control method of rubber unstacking robot dealing with unpredictable and time-variable adhesion force. Robotics and Computer-Integrated Manufacturing, 2021, 67, 102038.
doi: 10.1016/j.rcim.2020.102038
|
8 |
GAO J L, YE W J, GUO J, et al Deep reinforcement learning for indoor mobile robot path planning. Sensors, 2020, 20 (19): 5493.
doi: 10.3390/s20195493
|
9 |
XIE J Y, PENG X D, WANG H J, et al UAV autonomous tracking and landing based on deep reinforcement learning strategy. Sensors, 2020, 20 (19): 5630.
doi: 10.3390/s20195630
|
10 |
XU X, ZUO L, LI X, et al A reinforcement learning approach to autonomous decision making of intelligent vehicles on highways. IEEE Trans. on Systems, Man and Cybernetics Systems, 2018, 50 (10): 3884- 3897.
|
11 |
HE Y, YU F R, ZHAO N, et al Software-defined networks with mobile edge computing and caching for smart cities: a big data deep reinforcement learning approach. IEEE Communications Magazine, 2017, 55 (12): 31- 37.
doi: 10.1109/MCOM.2017.1700246
|
12 |
BRANDI S, PISCITELLI M S, MARTELLACCI M, et al Deep reinforcement learning to optimise indoor temperature control and heating energy consumption in buildings. Energy and Buildings, 2020, 224 (1): 110225.
|
13 |
KHAN A, LAPKIN A, Searching for optimal process routes: a reinforcement learning approach. Computers and Chemical Engineering, 2020, 141(4): 107027.
|
14 |
MA R, VANSTRUM E B, LEE R, et al Machine learning in the optimization of robotics in the operative field. Current Opinion in Urology, 2020, 30 (6): 808- 816.
doi: 10.1097/MOU.0000000000000816
|
15 |
PARK H, SIM M K, CHOI D G. An intelligent financial portfolio trading strategy using deep Q-learning. Expert Systems with Applications, 2020, 158(15): 113573
|
16 |
HU Y, YAO Y, LEE W S A reinforcement learning approach for optimizing multiple traveling salesman problems over graphs. Knowledge-Based Systems, 2020, 204 (27): 106244.
|
17 |
AINSLIE G W Impulse control in pigeons. Journal of the Experimental Analysis of Behavior, 1974, 21 (3): 485- 489.
doi: 10.1901/jeab.1974.21-485
|
18 |
TAKAHASHI T Loss of self-control in intertemporal choice may be attributable to logarithmic time-perception. Medical Hypotheses, 2005, 65 (4): 691- 693.
doi: 10.1016/j.mehy.2005.04.040
|
19 |
NAKAHARA H, KAVERI S. Internal-time temporal difference model for neural value-based decision making. Neural Computation, 2010, 22(12): 3062–3106.
|
20 |
JARMOLOWICZ D P, HUDNALL J L, HALE L, et al Delay discounting as impaired valuation: delayed rewards in an animal obesity model. Journal of the Experimental Analysis of Behavior, 2017, 108 (2): 171- 183.
doi: 10.1002/jeab.275
|
21 |
FOSCUE E P, WOOD K N, SCHRAMM-SAPYTA N L. Characterization of a semi-rapid method for assessing delay discounting in rodents. Pharmacology Biochemistry and Behavior, 2012, 101(2): 187–192
|
22 |
PAPALE A E, STOTT J J, POWELL N J, et al Interactions between deliberation and delay-discounting in rats. Cognitive, Affective, & Behavioral Neuroscience, 2012, 12 (3): 513- 526.
|
23 |
YAMAGUCHI Y, SAKAI Y, Reinforcement learning for discounted values often loses the goal in the application to animal learning. Neural Networks, 2012, 35(1): 88–91
|
24 |
KNOX W B, STONE P. Framing reinforcement learning from human reward: reward positivity, temporal discounting, episodicity, and performance. Artificial Intelligence, 2015, 225(1): 24–50
|
25 |
WANG J P, WANG G, MAO X B, et al Motion control method of two-link manipulator based on deep reinforcement learning. Journal of Computer Applications, 2021, 41 (6): 1799- 1804.
|
26 |
WEI H B, HE S C Multi-objective optimal control strategy for plug-in diesel electric hybrid vehicles based on deep reinforcement learning. Journal of Chongqing Jiaotong University (Natural Science), 2021, 40 (1): 44- 52.
|
27 |
LI C, HUANG Y Y, ZHANG Y L, et al Multi-agent decision-making method based on Actor-Critic framework and its application in wargame. Systems Engineering and Electronics, 2020, 43 (3): 755- 762.
|
28 |
ZHANG Q H, AO B Q, ZHANG Q X Reinforcement learning guidance law of Q-learning. Journal of Systems Engineering and Electronics, 2019, 42 (2): 414- 419.
|
29 |
SUTTON R S, BARTO A G. Reinforcement learning: an introduction. 2nd ed. Cadge: MIT Press, 2018.
|