Optimizing agent behavior over long time scales by transporting value

56Citations
Citations of this article
281Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

Humans prolifically engage in mental time travel. We dwell on past actions and experience satisfaction or regret. More than storytelling, these recollections change how we act in the future and endow us with a computationally important ability to link actions and consequences across spans of time, which helps address the problem of long-term credit assignment: the question of how to evaluate the utility of actions within a long-duration behavioral sequence. Existing approaches to credit assignment in AI cannot solve tasks with long delays between actions and consequences. Here, we introduce a paradigm where agents use recall of specific memories to credit past actions, allowing them to solve problems that are intractable for existing algorithms. This paradigm broadens the scope of problems that can be investigated in AI and offers a mechanistic account of behaviors that may inspire models in neuroscience, psychology, and behavioral economics.

References Powered by Scopus

Deep residual learning for image recognition

175065Citations
N/AReaders
Get full text

Human-level control through deep reinforcement learning

22655Citations
N/AReaders
Get full text

Speech recognition with deep recurrent neural networks

7204Citations
N/AReaders
Get full text

Cited by Powered by Scopus

Champion-level drone racing using deep reinforcement learning

228Citations
N/AReaders
Get full text

Deep Reinforcement Learning and Its Neuroscientific Implications

135Citations
N/AReaders
Get full text

Towards Continual Reinforcement Learning: A Review and Perspectives

118Citations
N/AReaders
Get full text

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Cite

CITATION STYLE

APA

Hung, C. C., Lillicrap, T., Abramson, J., Wu, Y., Mirza, M., Carnevale, F., … Wayne, G. (2019). Optimizing agent behavior over long time scales by transporting value. Nature Communications, 10(1). https://doi.org/10.1038/s41467-019-13073-w

Readers over time

‘18‘19‘20‘21‘22‘23‘24‘250255075100

Readers' Seniority

Tooltip

PhD / Post grad / Masters / Doc 131

71%

Researcher 40

22%

Professor / Associate Prof. 8

4%

Lecturer / Post doc 5

3%

Readers' Discipline

Tooltip

Computer Science 116

67%

Engineering 24

14%

Neuroscience 22

13%

Physics and Astronomy 12

7%

Article Metrics

Tooltip
Mentions
News Mentions: 1
Social Media
Shares, Likes & Comments: 13

Save time finding and organizing research with Mendeley

Sign up for free
0